Sequence Classifiers in C#: Hidden Conditional Random Fields

After a preliminary article on hidden Markov models, some months ago I had finally posted the article on Hidden Conditional Random Fields (HCRF) on CodeProject. The HCRF is a discriminative model, forming the generative-discriminative pair with the hidden Markov model classifers.

This CodeProject article is a second on a series of articles about sequence classification, the first being about Hidden Markov Models. I’ve used this opportunity to write a little about generative versus discriminative models, and also provide a brief discussion on how Vapnik’s ideas apply to these learning paradigms.

All the code available on those articles are also available within the Accord.NET Framework. Those articles provide good examples on how to use the framework and can be regarded as a practical implementation on how to use those models with the framework.

Complete framework documentation can be found live at the project’s website, as well as in the framework’s GitHub wiki. The framework has now been referred on 30+ publications over the years, and several more are already in the works, by me and users around the world.

Academical publications

Talking about publications, the framework has been used within my own research on Computer Vision. If you need help in understanding the inner workings of the HCRF, a more visual explanation on the HCRF derivation can also be found at the presentation I gave on Iberamia 2012 about Fingerspelling Recognition with Support Vector Machines and Hidden Conditional Random Fields.

An application to a more interesting problem, namely natural words drawn from Sign Languages using a Microsoft Kinect, has also been accepted for publication at the 9th International Conference on Machine Learning and Data Mining, MLDM 2013, and will be available soon. Update: it is available at

As usual, hope you find it interesting!

Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) for Unconstrained Optimization in C#

MSP120691a0731732i9i4h220000462e1a4c173g8b95-255B6-255D

The Limited-memory Broyden-Fletcher-Goldfarb-Shanno method is an optimization method belonging to the family of quasi-Newton methods for unconstrained non-linear optimization. In short terms, it is an off-the-shelf optimizer for seeking either minimum or maximum points of a any differentiable and possibly non-linear function, requiring only an expression of the function and its gradient. The goal of this article is to provide and demonstrate a C# port of the L-BFGS method originally written by Jorge Nocedal in Fortran.

Introduction

Function optimization is a common problem found in many numerical applications. Suppose we have a differentiable function f : Rn → R and we would like to obtain its minimal or maximum value while traversing its space of possible input values. Those kind of problems arise often in parameter optimization of machine learning models and other statistic related applications (and in a myriad of other applications too – however, we will pay special attention to cases related to machine learning).

The problem of maximizing or minimizing a function can be simply stated as max f(x) or min f(x). When there aren’t any constraints limiting possible values for x, it is widely known from calculus that the maximum or a minimum of such a function would occur when the first derivatives f’(x) of the function f(x) in respect to x are zero. Newton’s method is a common method for finding the roots of a differentiable function and is indeed a suitable method for finding those values such that first derivatives of a function are zero.

 

Surface of the example function. Contour lines of the example function.

Example of a convex, but non-linear function f(x,y) = exp{-(x-1)²} + exp{-(y-2)²/2}. Images have been created using Wolfram Alpha.

 

However, while Newton’s method may work very well with functions of a single variable, the generalization to higher dimensions will require the replacement of the first derivative f’(x) with the function’s gradient vector g of partial derivatives, and the second derivative f’’(x) with the inverse Hessian matrix H-1. The problem is that the Hessian is a square matrix with as many rows and columns as parameters of our function f, and, besides being very costly and challenging to compute, even its size may be infeasible to accommodate in main memory when dealing with very high dimensional problems.

To overcome those limitations, several methods attempt to replace the Hessian matrix with an approximation extracted from the first partial derivatives alone. Those methods are called quasi-Newton methods, as they replace H in Newton’s method by an approximation instead of using the full Hessian. In case the function being optimized has any particular form or special characteristic which could be exploited, it is also possible to use more specialized methods such as the Gauss-Newton, Levenberg-Marquardt or even lower-bound approximation methods to avoid computing the full Hessian.

The quasi-Newton BFGS method for non-linear optimization builds an approximation for the Hessian matrix based on estimates extracted solely from first order information. However, even being cheaper to compute, the memory requirements are still high as the method still needs to accommodate a dense N x N matrix for the inverse Hessian approximation (where N is the number of variables for the function). To overcome this problem, the L-BFGS method, or the limited-memory version of the BFGS method, never forms or stores the approximation matrix, but only a few vectors which can be used to represent it implicitly.

Source code

The code presented here is based upon a direct translation of the original code by Jorge Nocedal. His original code was released under the permissive BSD license, and honoring the original license and the author’s considerations, this code is released under the BSD as well.

L-BFGS Class diagrams

The L-BFGS method is implemented within the BroydenFletcherGoldfarbShanno class. Upon object creation, it is possible to specify a function and its gradient through either the use of Lambda expressions or by specifying handles for named functions. After the object has been initialized, a simple call to Minimize runs the standard L-BFGS method until convergence. The details for the Minimize method is given below.

The version of the code detailed here only supports Minimization problems. However, it is possible to note that any minimization problem can be converted into a maximization problem and vice-versa by taking the opposite of the function and its gradient.

Using the code

Code usage is rather simple. Suppose we would like to maximize the function g(x,y) = exp{-(x-1)²} + exp{-(y-2)²/2}:

Since we would like to perform a maximization, we first have to convert it to a minimization problem. The minimization version of the function is simply given by taking f(x,y) = –g(x,y):

Which is a convex function, as can be seen by plotting its surface. As it can be seen, the minimum of the function lies on the point (x,y) = (1,2). As expected, this point coincides with the roots of the partial derivative functions, as shown in the line plots below:

 

Example optimization cast as an minimization problem.

The minimization problem min f(x,y) = -(exp(-(x-1)²) + exp(-(y-2)²/2)), computed by Wolfram Alpha.

Partial derivative for x. Partial derivative for y.

The roots of the partial derivatives in respective to x (left) and y (right). It is possible to see that the roots occur at 1 and 2, respectively. Images computed using Wolfram Alpha. 

 

The first step in solving this optimization problem automatically is first to specify the target function. The target function f, in this case, can be specified through a lambda function in the form:

The next step is to specify the gradient g for the function f. In this particular case, we can manually compute the partial derivatives to be df/dx = -2 e^(-(x-1)^2) (x-1) and df/dy = -e^(-1/2 (y-2)^2) (y-2), in respect to x and y, respectively. Writing a lambda function to compute the gradient vector g, we have:

We can also note that this example was rather simple, so the gradient vector was easy to calculate. However, the gradient could also have been computed automatically using Accord.NET‘s FiniteDifferences class. In either case, all we have to do now is to create our L-BFGS solver and call its Minimize() method to begin optimization:

After the optimization is complete, the solution parameter will be available in the Solution property of the lbfgs object, and will be equal to { 1.0, 2.0 }. And also in case one is interested in progress reports (such as in the case of optimizing very large functions), it is also possible to register a listener to the Progress event handler. The complete version of the sample application accompanying the source code is given below:

Conclusion

In this post, we showed how to use a reduced set of the Accord.NET Framework‘s to perform non-linear optimization. This routine is also used by the Conditional Random Fields and Hidden Conditional Random Fields trainers to optimize parameters for such models. The solver routines have been adapted from the original Fortran’s source code from Nocedal, which, tracing back to a 2001 message from the author, also have been reported to be available under the public domain. Nevertheless, the reduced set of the framework available for download within this post is available under a BSD license, as an alternative to the version available within the Accord.NET Framework which is available only under a LGPL license.

As always, I hope someone will find it useful 🙂

References

Decision Trees in C#

tree_thumb-25255B6-25255D

Decision trees are simple predictive models which map input attributes to a target value using simple conditional rules. Trees are commonly used in problems whose solutions must be readily understandable or explainable by humans, such as in computer-aided diagnostics and credit analysis.

treeIntroduction

Decision Trees give a direct and intuitive way for obtaining the classification of a new instance from a set of simple rules. Because of this characteristic, decision trees find wide use in situations in which the interpretation of the obtained results and of the reasoning process is crucial, such as in computer-aided diagnostics (CAD) and in financial credit analysis. Consumer credit analysis is an interesting example because, in many countries, one can not simply reject credit without giving a justification, justification of which is trivial to extract from a decision tree.

The tree on the right has been generated using the Context Free software based on the grammar shared by davdrn.

Learning decision trees

Decision trees can be simply drawn by hand based on any prior knowledge the author may have. However, their real power becomes apparent when trees are learned automatically, through some learning algorithm.

The most notable and classics examples to decision tree learning are the algorithms ID3 (Quinlan, 1986) and the C4.5 (Quinlan, 1993). Both are examples of greedy algorithms, performing local optimum decisions in the hope of producing a most general tree. Such algorithms are based on the principle of the Occam’s razor, favoring simpler or smaller trees in the hope that such smaller trees could retain more generalization power. This preference is formalized through the specification of an hypothesis ordering criteria such as the information gain. The information gain measures the, as the name implies, gain of information in using each of the attributes as a next factor to consider during decision. The information gain can be defined as:

image003

image004

However, the information gain has a bias towards attributes with a high number of possible values (Mitchell, 1997). A way to overcome this bias is to select new selection attributes based on alternative criteria, such as the gain ratio (Quinlan, 1986), defined as:

image005

image006

In the GainRatio, the SplitInformation term attenuates the importance given to attributes with many, uniformly distributed, possible values.

Iterative Dichotomizer 3 (ID3)

The algorithm presented below is a slightly different version of the original ID3 algorithm as presented by Quinlan. The modifications are to support multiple output labels. In each recursion of the algorithm, the attribute which bests classifiers the set of instances (or examples, or input-output pairs, or data) is selected according to some selection criteria, such as the InfoGain or the GainRatio.

  • ID3(instances, target_attribute, attributes)
    • Create a new root node to the tree.
    • If all instances have the target_attribute belonging to the same class c,
      • Return the tree with single root node with label c.
    • If attributes is empty, then
      • Return the tree with single root node with the most common label of the target_attribute in instances.
    • Else
      • A ← the attribute in attributes which best classifies instances
      • root decision attribute ← A
      • Foreach possible value vi of A,
        • Add a new ramification below root, corresponding to the test A = vi
        • Let instancesvi be the subset of instances with the value vi for A
        • If instancesvi is empty then
          • Below this ramification, add a new leaf node with the most common value of target_attribute in instances.
        • Else below this ramification, add the subtree given by the recursion:
          ID3(instancesvi, target_attribute, attributes – { A })
    • End
  • Return root

Difficulties and disadvantages of decision tree learning

Despite relying on the Occam’s Razor to guide the learning, neither ID3 or C4.5 algorithms are not guaranteed to produce the smaller or more general tree possible. This happens because their learning is based solely on heuristics and not in truly optimality criteria. The following example, from (Esmeir & Markovitch, 2007) illustrates the learning of the concept xor (exclusive or) by the ID3 algorithm. In this example, A3 and A4 are irrelevant attributes, having absolutely no influence on the target answer. However, yet being irrelevant, ID3 will select one of them to belong to the tree. In fact, ID3 will select the irrelevant attribute A4 to be the root of the learned tree.

A1 A2 A3 A4 label
1 0 0 1 +
0 1 0 0 +
0 0 0 0
1 1 0 0
0 1 1 1 +
0 0 1 1
1 0 1 1 +
image

One complication of decision tree learning is that the problem to find the smaller consistent tree (in the sense of classifying correctly all training samples) is known to be NP-complete (Hyafil & Rivest, 1976). Moreover, the separating decision surface generated by such trees are always formed by parallel cuts alongside the attribute space axis, which could be a severely suboptimal solution (Bishop, 2007, p. 666). The example given by Bishop illustrates well this problem: for example, to separate classes whose optimum decision margin are on 45 degrees from one of the axis, it will be needed a high number of parallel cuts in comparison with a single cut on the diagonal such as could be given by any linear decision machine. Another disadvantage of traditional decision tree learning algorithms is that most methods require only a constant learning time, and, as such, do not allow for trading extra training time for a better solutions. The work of (Esmeir & Markovitch, 2007) is dedicated to overcome such problem.

The following picture shows an example on how learning by decision trees is limited to cuts parallel to the axis, as described by Bishop. The Ying-Yang classification problem is a classical example of a non-linearly separable decision problem. Decision trees, albeit not being linear classifiers, have difficulty classifying this set with simple thresholds.

data-closerdecision-closertree

Top-Left: Ying-Yang non-linear classification problem. Top-Right: Decision surface extracted by a decision tree. Bottom: Decision threshold rules extracted from the tree.

However, despite all of those shortcomings, decision trees plays major roles as base of many successful algorithms. One interesting application and of notorious success is given in the field of object detection in images. The first real-time face detecting algorithm (Viola & Jones, 2001) is based on a degenerated decision tree (also known as a cascade). The body recognition and segmentation algorithm used by the Xbox 360 gaming platform used to process depth images generated by its companion sensor Kinect is equally based on the use of decision trees (Shotton, et al., 2011). As well is the case of the FAST algorithm for interest point detection in images (Rosten & Drummond, 2006).

I should also make the note that both the Viola-Jones and the FAST algorithms are available in the Accord.NET Framework for immediate use (the Viola-Jones (HaarObjectDetector) has also been recently updated to support multiple threads, so feel free to take a look and experiment with it!).

In sum, its possible to say that great part of the applicability of decision trees came from the simple fact that they are extremely fast to evaluate. One of the reasons for this feature is being easily translated to sets of conditional instructions in a conventional computer program. The decision trees now available in the Accord.NET Framework make full use of this fact and enables the user to actually compile the decision trees to native code on-the-fly, augmenting even more its performance during classification.

Source code

The code presented in this section is actually part of the Accord.NET Framework. The Accord.NET Framework is a framework extension to the already very popular AForge.NET Framework, adding and providing new algorithms and techniques for machine learning, computer vision and even more.

The Accord.MachineLearning.DecisionTree namespace is comprised of the following classes:

  • DecisionTree, the main class representing a decision tree, with methods such as Compute to compute the tree classification given an input vector;
  • DecisionNode, the node class for the decision tree, which may or may not have children nodes contained under a collection of children represented by a DecisionBranchNodeCollection;
  • DecisionVariable, a class to specify the nature of each variable processable by the tree, such as if the variable is continuous, discrete, which are their expected or valid ranges;
  • DecisionBranchNodeCollection, a class to contain children nodes together with information about which attribute of the data should be compared with the child nodes during reasoning.

 

Whose relationships can be seen in the following class diagram:

class diagram

The learning algorithms available are either the ID3 algorithm discussed above, or its improved version C4.5 (which can handle continuous variables, but at this time does not yet support pruning), both proposed and published by Ross Quinlan.

learning

Despite the bit complicated look, usage is rather simple as it will be shown in the next section.

Using the code

Consider, for example, the famous Play Tennis example by Tom Mitchell (1998):

In the aforementioned example, one would like to infer if a person would play tennis or not based solely on four input variables. Those variables are all categorical, meaning that there is no order between the possible values for the variable (i.e. there is no order relationship between Sunny and Rain, one is not bigger nor smaller than the other, but are just distinct). Moreover, the rows, or instances presented above represent days on which the behavior of the person has been registered and annotated, pretty much building our set of observation instances for learning.

In order to try to learn a decision tree, we will first convert this problem to a more simpler representation. Since all variables are categories, it does not matter if they are represented as strings, or numbers, since both are just symbols for the event they represent. Since numbers are more easily representable than text string, we will convert the problem to use a discrete alphabet through the use of a codebook.

A codebook effectively transforms any distinct possible value for a variable into an integer symbol. For example, “Sunny” could as well be represented by the integer label 0, “Overcast” by “1”, Rain by “2”, and the same goes by for the other variables. So:

Now we should specify our decision tree. We will be trying to build a tree to predict the last column, entitled “PlayTennis”. For this, we will be using the “Outlook”, “Temperature”, “Humidity” and “Wind” as predictors (variables which will we will use for our decision). Since those are categorical, we must specify, at the moment of creation of our tree, the characteristics of each of those variables. So:

Let’s now proceed and create our DecisionTree:

Now we have created our decision tree. Unfortunately, it is not really very useful, since we haven’t taught it the problem we are trying to predict. So now we must instantiate a learning algorithm to make it useful. For this task, in which we have only categorical variables, the simplest choice is to use the ID3 algorithm by Quinlan. Let’s do it:

At this point, the tree has been created. In order to ask the tree to classify new samples, we can use:

Please note that, in case any of the steps in this post doesn’t work out, it might be because the most up-to-date version in the Accord.NET Framework may have undergone some changes. In this case, please refer to the official documentation at the Accord.NET website.

Going further, a suitable representation of the learned tree could be given by the following diagram:

image

However, more than just a diagram, we can also go and generate a .NET Expression Tree describing our decision tree. Expression trees represent code in the form of a tree of expressions, which can then be read, modified or compiled. By compiling the DecisionTree‘s expression, we are able to generate code on-the-fly and let the JIT compile it down to native code at runtime, greatly improving the performance of our decision tree:

And here is the resulting C# code obtained by compiling the expression into a lambda function, dumping the function into an dynamic assembly, opening and inspecting this assembly using ILSpy:

Conclusion

Decision trees are useful tools when the problem to be solved needs to be quickly interpreted and understood by humans. Another suitable use is when the decision needs to be fast. However, decision trees, at least those trained by simple training algorithms such as ID3 and C4.5 can perform quite poorly depending on the problem. As it happens with all machine learning techniques, it is futile to believe there is a one true classifier which would act perfectly on all possible imaginable scenarios. As always, it is important to know our tools and know in which situation each technique would work better.

References

  • Bishop, C. M., 2007. Pattern Recognition and Machine Learning (Information Science and Statistics). 1st ed. 2006. Corr. 2nd printing ed. s.l.:Springer
  • Fayyad, U. M. & Irani, K. B., 1992. On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning, January, 8(1), pp. 87-102.
  • Quinlan, J. R., 1986. Induction of decision trees. Machine Learning, 1(1), pp. 81-106.
  • Quinlan, J. R., 1993. C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning). 1 ed. s.l.:Morgan Kaufmann.
  • Shotton, J. et al., 2011. Real-Time Human Pose Recognition in Parts from Single Depth Images. s.l., s.n.
  • Viola, P. & Jones, M., 2001. Robust Real-time Object Detection. s.l., s.n.
  • Mitchell, T. M., 1997. Decision Tree Learning. In:: Machine Learning (McGraw-Hill Series in Computer Science). s.l.:McGraw Hill.
  • Mitchell, T. M., 1997. Machine Learning (McGraw-Hill Series in Computer Science). Boston(MA): WCB/McGraw-Hill.
  • Esmeir, S. & Markovitch, S., 2007. Anytime Learning of Decision Trees. J. Mach. Learn. Res., May, Volume 8, pp. 891-933.
  • Hyafil, L. & Rivest, R. L., 1976. Constructing Optimal Binary Decision Trees is NP-complete. Information Processing Letters, 5(1), pp. 15-17.

Matrix manipulation using Accord.NET

accord-matrix2_thumb

Matrix manipulation in Accord.NET
is very straightforward: Just add a new using directive on top of your class to
have (literally) about a hundred new extension methods that operate directly on
.NET multi-dimensional arrays.


accord-matrix2

Introduction


accord-matrix

Accord.NET uses a bit different approach for matrix manipulation in contrast to
other libraries. By using C# 3.0 extension methods, Accord adds several of the standard
methods you would expect from a Matrix library,
such as linear system solving, matrix algebra and numerical decompositions directly
to the standard double[,] (or more generally T[,]) matrices of the framework
[^].

This approach offers some advantages since it avoids mismatches when one is using
multiple libraries, each with their own and often incompatible Matrix implementations.
Extending multi-dimensional arrays makes the use of matrices much more natural in
the .NET world, as no specialized Matrix classes have to be used and no code has
to be rewritten just to use a particular Matrix implementation.

 

Please note, however, that most methods implemented by Accord are not equivalent
to the heavily optimized versions of more specialized numerical packages, such as

BLAS
, from a performance view. Their real power comes when prototyping or realizing algorithms into code. Once an algorithm is written, several
unit tests can be created to test the correctness of the code. After that, because
we have an early working prototype, it will be much easier to perform optimizations
as needed. The key point is to have a working version early which can (if
actually necessary) be optimized later.

Using extension methods

The first step is to include a new using directive on the top of
your source file.


accord-matrix3

This is all that is necessary after you have referenced the Accord.Math library
in your project. Now you can use all of the methods and operations described below.
The following list is nowhere complete, but shows only some basics and highlights
from the current version of Accord.

Declaring matrices

Using standard .NET declarations

To declare matrices, no special code is required. Just create multi-dimensional
arrays as usual.

Using Accord.NET extensions

Additionally, one may wish to use one of the convenience methods of Accord.NET to
create specialized matrices such as the Identity matrix, multiples of the Identity
matrix or Diagonal matrices.

Accord.NET (C#) MATLAB®

Using Accord.NET extensions with implicit typing

By using implicit type variable declaration, the code acquires a certain lightweight
feeling and gets closer of its MATLAB®/Octave counterpart (MATLAB is a registered
trademark of The MathWorks, Inc)
.

Accord.NET (C#) MATLAB®

A much more closer alternative will be possible by using Algorithm Environments,
a upcoming feature in Accord. When available, this feature will allow construction
of mathematical code using

directly. The code will be based on current Octave syntax. Octave is a high-level
language, primarily intended for
numerical computations
, that is mostly compatible with
MATLAB
.

Matrix operations

All standard matrix operations such as transpose, inverse, column and row manipulations
are available in the extension methods. In the example below, A is the same standard
double[,] matrix declared in the first section of this
article. All methods return a new double[,] matrix
as a result, leaving the original matrix A untouched.

Operation Accord.NET (C#) MATLAB®
Transpose
Inverse
Pseudo-Inverse

Matrix Algebra

All common algebraic operations for matrices are also implemented. Those are the
common operations such as addition, subtraction and multiplication. Here, division
is actually a shortcut for multiplying by the inverse.

Operation Accord.NET (C#) MATLAB®
Addition
Subtraction
Multiplication
Division

The above also works with vectors and scalars.

Operations Accord.NET (C#) MATLAB®
Multiplying by a scalar
Multiplying by a column vector

Special element-wise operations

Element-wise operations are operations performed on each element of the matrix or
vector. In Matlab, they are some times known as the dot operations, since they are
usually denoted by prefixing a dot on the common operators.

Operation Accord.NET (C#) MATLAB®
Multiplication
Division
Power

Vector operations

Accord.NET can also perform many vector operations. Among them are the many flavors
of products between vectors, such as the inner, the outer and the Cartesian.

Operation Accord.NET (C#) MATLAB®
Inner product (a.k.a. the scalar product)
Outer product (a.k.a. the matrix product) w = u’*v
Cartesian product  
Euclidean Norm
Sum
Product

Matrix characteristics

Some common matrix characteristics, such as the determinant and trace, are readily
available.

Operation Accord.NET (C#) MATLAB®
Determinant
Trace

Other characteristics

Other available characteristics are the Summation and Product of vector and matrix
elements.

Operation Accord.NET (C#) MATLAB®
Sum vector
Sum of elements
Sum along columns
Sum along rows

Linear Algebra

Linear algebra
is certainly one of the most important fields of mathematics, if not the most important
one. Accord includes many methods for numerical linear algebra such as matrix inversion
and matrix decompositions. Most of them were originally based on JAMA and MAPACK, but today Accord has some additions from
routines translated from EISPACK (mainly the
Generalized Eigenvalue Decomposition
[^],
which is currently absent from Jama).

Operation Accord.NET (C#) MATLAB®
Solve a linear system x = A.Solve(B) x = A B

Eigenvalue Decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition
Generalized decomposition

Singular Value Decomposition

Operation Accord.NET (C#) MATLAB®
Economy decomposition

QR Decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition

Cholesky decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition

LU Decomposition

Operation Accord.NET (C#) MATLAB®
Standard decomposition

Special operators

There are some other useful operators available in Accord.NET. There are facilities
to create index vectors (very common in Matlab for accessing sub-portions of a matrix),
select elements, find elements based on a selection criteria, and so on.

Operation Accord.NET (C#) MATLAB®
Create a vector of indices idx = 1:9
Selecting elements B = A(idx)
Finding elements matching a certain criteria (For example, finding x ∈ v / x > 2). v = [ 5 2 2 7 1 0];
idx = find(v > 2)
Reshaping a vector to a matrix. m = [1 2 3 4];
reshape(m,2,2)

 

More than just matrices

Accord.NET also offers many other standard mathematical functions such as the Gamma
function, log-Gamma, Digamma, Bessel functions, Incomplete beta integrals, and so
on.


accord-matrix4

For a complete listing of the framework features, please check the project page at Origo. In particular, don’t forget to check
out the the class diagram for the Accord.Math namespace.

Cheers!

 

Legal notice: MATLAB is a registered trademark of The MathWorks, Inc. All
other trademarks are the property of their respective owners.

Accord.NET Framework – An extension to AForge.NET

Today I have just released the first version of Accord.NET. The Accord.NET Framework is a C# framework I have developed over the years while working on several areas of artificial intelligence, machine learning and statistics.

The Accord.NET Framework extends the excellent AForge.NET Framework with new tools and libraries. In order to use Accord.NET, you must have AForge.NET already installed. The first version number of Accord.NET will be used to indicate the compatibility status with AForge.NET versions, thus the first version will be starting at 2.0.0.

 

The framework is comprised by the set of libraries and sample applications, which demonstrate their features:

  • Accord.Statistics – library with statistical analysis and other tools;
  • Accord.Imaging – extension to the AForge.NET Imaging library with new filters and routines;
  • Accord.Neuro – extension to the AForge.NET Neuro library with other learning algorithms;
  • Accord.MachineLearning – extension to AForge’s machine learning library with Support Vector Machines;
  • Accord.Audio – experimental library with filters and audio processing routines.

 

The Framework has just been released, so be ready to expect bugs and unpolished/unfinished areas. The code is released under a LGPL license. For additional help, issues, and discussions, please refer to the recently created forums.

Introducing System.Console.Forms

terminal

Some days ago I started a new project just for fun. System.Console.Forms is a extension to .NET framework to “port” as much as possible of System.Windows.Forms to a Console environment.

 

I know a lot of people think there is absolutely no reason for developing console programs nowadays, but they are just plain wrong. It obviously isn’t mainstream, but if you use ssh a lot and cannot afford a X connection, then working with console interfaces makes a lot of sense. Also, I know there are other known, well-tested and proven solutions for console user interfaces like the ncurses library, but they are not very object oriented and not very user-friendly if you came from a .NET WinForms or Java SWT/Swing world.

Thus this project was started.

 

I know accomplishing this task would consume a lot of time and effort, so I can’t promise it will ever be complete, or even if it will not die a few months from now. But if are interested and want to help, feel free to contact me. Much of the base code is almost done (it is already possible to create Forms and UserControls) but most of the default controls, like TextBoxes, RadioButtons and ToolStripMenus are still incomplete or missing.

nano#nano# - 2

As you can see, it currently lacks some visual enhancements (such as drop-shadow borders and a better default form design), but this is something scheduled only for future releases. The first milestone is to replicate only the basic functionality and interface of nano, and the second is to fully reproduce MS-DOS edit.exe. When this becomes possible, the library will be almost feature complete.

I don’t think I’ll ever be able to finish this by myself alone, so I licensed the project under the LGPL if anyone else gets interested. The project is being hosted at Google Code, and the project page is http://console-forms.googlecode.com.

Software Licenses’ Legal Notices

800px-Gpl_auto
When releasing software it is always good to release under a suitable license. Licensing policies were created to protect you and your code, restricting what others can do using your code, and not what you can do with your own code. The copyright holder (the author, e.g. you) is able to change licensing policies anytime, but, depending on the license you choose, only as long as every author and every person who has ever contributed to your project also agree.

Wikipedia offers a list of popular licenses and a comparison of free software licenses you can use in your projects. But no matter which one you choose, a common practice is to include a license notice header in all your source files to enforce the restrictions you have chosen. Here is a list of common headers for the most common available software licenses. Comparison between its different implications is beyond the scope of this post, but more information is available on the Additional Information section below.


GPL – GNU Public License (details)

LGPL – Lesser GNU Public License (details)

MPL – Mozilla Public License (details)

BSD License (details)

MIT License (details)

Apache License (details)

CC – Creative Commons (details)

Creative Commons licenses are not intended to apply to, and should not be used for software. I’ve included it here just to remember you.

Public Domain (details)

This is only a suggestion as there is no common sense on how to dedicate your work for the public domain. This excerpt was borrowed from the SQLite site, a popular software that is itself dedicated to the public domain.

WARNING: Please pay attention that the Public Domain Dedication is not a license. By using it, you do not simply carve out exceptions to your copyright; you grant your entire copyright to the public without condition. This grant is permanent and irreversible. You’ve been warned.


Additional Information


For a nice comparison of different licenses, implications and more useful information, please check:

Comparison of Source Code Licenses

Also, before blindly adopting the GPLv3, please be sure you have completely and correctly understood everything it says. For more info about the GPL and the GPLv2 vs GPLv3 debate, see:

The GPL for Dummies
The GPLv2 vs. GPLv3 Debate
What the kernel guys are and aren’t (and really should be) saying about GPLv3
The Dangers and Problems with GPLv3
The GPLv2 vs. GPLv3 Chart

Personally, I would rather go with the GPL v2 instead.

Configuring WinPic800

WinPic800 is a free (as in beer) PIC programmer software compatible with the Tait-style of hardware programmers. In addition, it is also compatible with the programmer I detailed in my last post.

However, due to the variety of styles and designs of the same programmer out in the web, one has to carefully configure soft programmers so they can talk with a hardware device. To configure WinPic800 to work with the parallel schmitt trigger programmer, please follow this instructions:

  • Download the latest version of WinPic800
  • Download the hardware definition file for this device
  • Locate the directory where you installed WinPic800
  • Put the downloaded .hwp file inside the Hardware folder
  • Open WinPic800, go to Devices > Settings > Hardware
  • Select the Tait Schmitt device entry and click OK

Now you may plug the hardware on the parallel port, insert a PIC on the programmer, turn it on and then click on Device > Detect Device. The name of the microcontroller on the programmers socket should be shown on screen.

Battery-Powered Parallel PIC Programmer

744px-Microchip_logo.svg_

For those who don’t know, PICs are essentialy tiny computers, with built-in internal memory, processor, input/output controllers, comparators and other peripherals packaged inside a single silicon chip.

744px-Microchip_logo.svg Last year, on the Microsoft Academic Cell for Robotics of the Federal University of São Carlos, I needed something to introduce our members to the eletronics world. So we decided to begin by using Microchip‘s PICmicro microcontrollers to build simple but functional circuits, a nice way to gather knowledge ranging from basic analog electronics to low-level microcontroller programming in the same time, while also having some fun.

But then, we needed a programmer. Not the person, but the hardware required to store a software program inside the microcontroller in order to instruct it into doing something useful, thus effectively programming the device.

After spending some time googling around for custom-built programmers, and after seeing they were quite simple to design, we decided to build or own as a good starting exercise. We just needed something simple, reliable, that wasn’t too expensive and that could be easily built by students.

 

Requirements

    To keep the budget factor low, we decided on a David Tait‘s design which uses the computer parallel port and a parallel cable as a transmission line. Before designing the device, there were some initial modifications we decided to have in our circuit:

    BH4AA-PCIt would run entirely on batteries. Having a separate power supply for the programmer is just asking for more wires around and having more wires around usually equals having more mess on the work desk. Also adding to the fact that power supplies here aren’t so cheap for a student budget, creating a self-powered device was the most suitable option for us.

    But since we wanted something simple and cheap, but still reliable and expensible, some extra care was needed, because, if power supplies weren’t cheap, good batteries aren’t either. Efficiency should be a main goal, because if we had to replace batteries everytime, then the savings on the power supply wouldn’t justify the increased cost in the long run.

     

The Design

    To improve efficiency, our circuit uses a schmitt trigger buffer for translating parallel port voltages into TTL-compatible voltages. Also, the proper use of transmission line terminations allows the device to work with longer parallel cables. Actually, I have been using 2m cables myself and didn’t run into any issues yet.

    296-TO92-3, TO-226AA To regulate the unpredictable alkaline voltages down to a nice constant 5v, we also couldn’t rely on a standard linear regulator like the 7805 because it requires at least 7.5v to work properly. In other words, using it would require at least 5 cells with at least 90% of charge left available in our circuit. Too wasteful. In turn, a LDO regulator would be fantastic, as it works with voltages as low as 5.4 volts.

    Just for clarification, I couldn’t just plug 4 or 3 cells in series because newly replaced alkaline batteries can source voltages as high as 1.68v per cell rather than they nominal 1.5v, resulting in almost 7v of microcontroller killing voltage. Also, 9v batteries were avoided for their low mAh capacity, as the device should be able to power-up other circuits trough its ICSP connections during programming.

     

    To enter programming mode, most PICs needs a voltage source somewhere around 12v present on the MCLR pin. In the past, however, a real current source was required for EEPROM programming, but for the newer, flash-memory based chips, the current doesn’t actually matters, only the voltage does, as it only purpose is to signal the device to enter programming mode.

    Because of the very experimental nature of this project, we sacrified the ability to program EEPROM chips and aimed only for the flash-memory chips (it may work with EEPROMs, but we just haven’t tested it yet). So, if we need a voltage that is almost the triple of our available voltage, but don’t need any actual current, its the perfect scenario for employing a charge pump circuit.

    voltagedoubler Charge pumps are very efficient voltage converters, but with very finite impedance, they can’t source much current. Those circuits use capacitors a "buckets" to pump charge from one place to another, producing higher voltages at the expense of lower output currents. It didn’t take long before we noticed that Intersil offers free samples of their 7660S "Super voltage converter chips", which is great, although the 7660 family is very popular and available from several manufacturers all around the world.

    Nevertheless, we went on and asked Intersil for some of the 7660 samples.

     

Schematic

    All considerations taken into account, below is the final schematic for the device, featuring the LP2950-CZ LDO regulator and the ICL7660 in the "voltage tripler" configuration.

     

    Battery-Powered Parallel PIC Programmer (sch)

 

Parts list

    Qty Device (Description) Value Parts
    1 Power Switch DIP Switch SW1
    1 LED5MM Green LED1
    1 LED5MM Red LED2
    6 Small Signal Diode 1N4148 D1, D2, D3, D4, D5, D6
    1 Resistor (0.25w) 270 R17
    2 Resistor (0.25w) 100 R14, R15
    2 Resistor (0.25w) 1k R1, R11
    4 Resistor (0.25w) 4k7 R3, R4, R5, R8
    9 Resistor (0.25w) 10k R2, R6, R7, R9, R10, R12, R13, R16, R18
    3 Ceramic Capacitor 100nF C3, C8, C11
    2 Ceramic Capacitor 470pF C9, C10
    2 Eletrolytic Capacitor 1uF C1, C2
    4 Eletrolytic Capacitor 10uF C4, C5, C6, C7
    2 Transistor PNP BC557C T1, T2
    1 Transistor NPN BC548 T3
    1 Voltage Converter ICL7660CPA IC3
    1 Schmitt Trigger Inverter 74HC14N IC5
    1 LDO Voltage Regulator LP2950CZ-5.0 IC1
    1 SIL Pin Header 3-way J2
    1 SIL Pin Header 3-way J1
    1 SIL Socket 20-way J3
    1 DIL Socket 18-way DIL18
    1 DIL Socket 20-way DIL20
    1 Serial Connector DB9 Female DB9
    1 Parallel Connector DB25 Female DB25
    1 Battery Clip 4xAA Cell Holder  

 

Building

    The samples from Intersil arrived only a few weeks later, carefully packed inside a hard plastic box filled with foam, ensuring the chips had some comfort during their long travel overseas.

    Well, that box was really too useful to be discarded, so we figured out a nicer destiny for them. At the time I didn’t yet have my fellow Dremel rotary tool, so I just cut its top off with a knife and made some holes on the sides for screwing a DB-25 connector (for the parallel port) and a DB-9 connector (for the ICSP connections). Here are the final pictures of the programmer mounted inside the Intersil’s sample’s box (sorry for the crappy resolution):

 

Layout

    And finally, here is the perfboard (pre-punched circuit board) layout we used for placing components on the box. Please note this layout should not be used for PCBs.

    Battery-Powered Parallel PIC Programmer (brd)

Downloads

    All Eagle files can be obtained here. For more information on how to setup a software programmer to work with this hardware, please see the next post, configuring WinPic800.

 

 

The Microchip name and logo is a registered trademark of Microchip Technology Incorporated in the U.S.A. and other countries. The schematic presented here is copyright of its original author, licensed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.