C# is now a better language than Java…

… but I really feel like I need to try Scala.

After some chatting and hearing wonders about this language I decided to try something new. I decided to try Scala.

Here is a list of cool references, not only for Scala, but about programming in general, kindly given by an experienced programmer I met some weeks ago.

 

Books

Programming in Scala: A comprehensive step-by-step guide

~ Martin Odersky, Lex Spoon, and Bill Venners

This book is the authoritative tutorial on the Scala programming language, co-written by the language’s designer, Martin Odersky.

Clean Code: A Handbook of Agile Software Craftsmanship

~ Robert C. “Uncle Bob” Martin

Even bad code can function. But if code isn’t clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn’t have to be that way.

Refactoring: Improving the Design of Existing Code

~ Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts

Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which “too small to be worth doing”. However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring – which allows you to gradually refactor a system over an extended period of time.

Implementation Patterns

~ Kent Beck

Great code doesn’t just function: it clearly and consistently communicates your intentions, allowing other programmers to understand your code, rely on it, and modify it with confidence. But great code doesn’t just happen. It is the outcome of hundreds of small but critical decisions programmers make every single day. Now, legendary software innovator Kent Beck–known worldwide for creating Extreme Programming and pioneering software patterns and test-driven development–focuses on these critical decisions, unearthing powerful “implementation patterns” for writing programs that are simpler, clearer, better organized, and more cost effective.

Code Complete: A Practical Handbook of Software Construction

~ Steve McConnell

For more than a decade, Steve McConnell, one of the premier authors and voices in the software community, has helped change the way developers write code–and produce better software. Now his classic book, CODE COMPLETE, has been fully updated and revised with best practices in the art and science of constructing software. Whether you’re a new developer seeking a sound introduction to the practice of software development or a veteran exploring strategic new approaches to problem solving, you’ll find a wealth of practical suggestions and methods for strengthening your skills. Topics include design, applying good techniques to construction, eliminating errors, planning, managing construction activities, and relating personal character to superior software. This new edition features fully updated information on programming techniques, including the emergence of Web-style programming, and integrated coverage of object-oriented design. You’ll also find new code examples–both good and bad–in C++, Microsoft(r) Visual Basic(r), C#, and Java, though the focus is squarely on techniques and practices.

Amazon Editorial Review

The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition (2nd Edition)

~ Frederick P. Brooks

The classic book on the human elements of software engineering. Software tools and development environments may have changed in the 21 years since the first edition of this book, but the peculiarly nonlinear economies of scale in collaborative work and the nature of individuals and groups has not changed an epsilon. If you write code or depend upon those who do, get this book as soon as possible — from Amazon.com Books, your library, or anyone else. You (and/or your colleagues) will be forever grateful. Very Highest Recommendation.

 

Blogs & Online Resources

Joel on Software

A weblog by Joel Spolsky, a programmer working in New York City, about software and software companies.

The Artima Developer Community

Artima.com is a collection of resources about Java, Jini, the JVM, and object oriented design.

Mark’s Blog

Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security. Among other feats, Russinovich was the man behind the discovery of the Sony rootkit in Sony DRM products in 2005.

Object Mentor’s Blog

A team of consultants who mentor their clients in C++, Java, OOP, Patterns, UML, Agile Methodologies, and Extreme Programming.

 

People to follow

@unclebobmartin

Known colloquially as “Uncle Bob”, Robert Cecil Martin has been a software professional since 1970 and an international software consultant since 1990. In 2001, he led the group that created Agile software development from Extreme programming techniques. He is also a leading member of the Software Craftsmanship movement.

He is founder and president of Object Mentor Inc., a team of consultants who mentor their clients in C++, Java, OOP, Patterns, UML, Agile Methodologies, and Extreme Programming.

@KentBeck

Kent Beck is an American software engineer and the creator of Extreme Programming and Test Driven Development. Beck was one of the 17 original signatories of the Agile Manifesto in 2001.

Google Chrome Bug?

chrome-bug_thumb-5B2-5D

It seems there is a minor bug in Google Chrome. Apparently, you can’t click links that are in the same horizontal line as the status bar that pops up from the bottom left corner.

For example, clicking the link marked with a red circle in the image below does nothing.

chrome-bug

Well, this obviously isn’t a show stopper, but I think its a bug nevertheless.

Logistic Regression in C#

Logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. Logistic regression is used extensively in the medical and social sciences as well as marketing applications such as prediction of a customer’s propensity to purchase a product or cease a subscription.

The code presented here is part of the Accord.NET Framework. The Accord.NET Framework is a framework for developing machine learning, computer vision, computer audition, statistics and math applications in .NET. To use the framework in your projects, install it by typing Install-Package Accord.MachineLearning in your IDE’s NuGet package manager.

Overview

The Logistic regression is a generalized linear model used for binomial regression. Like many forms of regression analysis, it makes use of several predictor variables that may be either numerical or categorical.

Logistic Sigmoid Function

The logistic sigmoid function is given by

g(z) = 1 / (1 + Exp(-z))

where in the context of logistical regression z is called the logit.

Logistic Regression Model

The logistic regression model is a generalized linear model. This means that it is just a linear regression model taken as input for a non-linear link function. The linear model has the form

z = c1x1 + c2x2 + … cnxn + i = ct x + i

where c is the coefficient vector, i is the intercept value and x is the observation vector for n variables and in the context of logistic regression is called the logit. The logit is then applied as input for the nonlinear logistic sigmoid function g(z), giving as result a probability.

So in a binomial problem where we are trying to determine whether a observation belongs to class C1 or class C2, the logistic model tell us that:

p(C1|x) = g(ct x + i)

p(C2|x) = 1 – p(C1|x)

where p(C1|x) denotes the probability of C1 being true when x is true.
In other words, denotes the probability of x belonging to class C1.

Coefficients

The coefficients for the logistic regression are the values which multiply each observation variable from a sample input vector. The intercept is analogous to the independent term in a polynomial linear regression model or the threshold or bias value for a neuron in a neural network model.

Odds Ratio

After computation of the logistic regression model, every coefficient will have an associated measure called the odds ratio. The odds ratio is a measure of effect size, describing the strength of association or non-independence between two binary data values and can be approximated by raising the coefficient value to the euler’s number.

Odds ratioc = ec

Standard Error

The standard error for the coefficients can be retrieved from the inverse Hessian matrix calculated during the model fitting phase and can be used to give confidence intervals for the odds ratio. The standard error for the i-th coefficient of the regression can be obtained as:

SEi = sqrt(diag(H-1)i)

Confidence Intervals

The confidence interval around the logistic regression coefficient is plus or minus 1.96*SEi, where SEi is the standard error for coefficient i. We can then define:

95% C.I.i = <lower, upper> = <coefficienti – 1.96 * SEi,  coefficienti + 1.96 * SEi>

Wald Statistic and Wald’s Test

The Wald statistic is the ratio of the logistic coefficient to its standard error. A Wald test is used to test the statistical significance of each coefficient in the model. A Wald test calculates a Z statistic, which is:

z = coefficient / standard error

This z value can be squared, yielding a Wald statistic with a chi-square distribution, or, alternatively, it can be taken as is and compared directly with a Normal distribution.

The Wald test outputs a p-value indicating the significance of individual independent variables. If the value is below a chosen significance threshold (typically 0.05), then the variable plays some role in determining the outcome variable that most likely is not result of chance alone. However, there are some problems with the use of the Wald statistic. The Likelihood-ratio test is a better alternative for the Wald test.

Likelihood-Ratio and Chi-Square Test

The likelihood-ratio is the ratio of the likelihood of the full model over the likelihood of a simpler nested model. When compared to the null model the likelihood-ratio test gives an overall model performance measure. When compared to nested models, each with one variable omitted, it tests the statistical significance of each coefficient in the model. These significance tests are considered to be more reliable than the Wald significance test.

The likelihood-ratio is a chi-square statistic given by:

D & = -2lnleft( frac{text{likelihood for first model}}{text{likelihood for second model}} right).

The model with more parameters will always fit at least as well (have a greater log-likelihood). Whether it fits significantly better and should thus be preferred can be determined by deriving the probability or p-value of the obtained difference D. In many cases, the probability distribution of the test statistic can be approximated by a chi-square distribution with (df1 − df2) degrees of freedom, where df1 and df2 are the degrees of freedom of models 1 and 2 respectively.

Regression to a Logistic Model

If we consider the mapping

φ(<x1, x2, …, xn>) = <x1, x2, … xn, 1>

The logistic regression model can also be rewritten as

p(C1|φ) = g(wt φ) = f(φ, w)

so that w contains all coefficients and the intercept value in a single weight vector. Doing so will allow the logistic regression model to be expressed as a common optimization model in the form f(φ, w) allowing many standard non-linear optimization algorithms to be directly applied in the search for the best parameters w that best fits the probability of a class C1 given a input vector φ.

Likelihood function

The likelihood function for the logistic model is given by:

p(t|w) = prod_{n=1}^N y_n^{t_n} { 1 - y_n }^{1-t_n}

but, as the log of products equals the sum of logs, taking the log of the likelihood function results in the Log-likelihood function in the form:

ln p(t|w) = sum_{n=1}^N { t_n ln y_n + (1-t_n) ln (1 - y_n) }

Furthermore, if we take the negative of the log-likelihood, we will have a error function called the cross-entropy error function:

E(w) = -ln p(t|w) = - sum_{n=1}^N { t_n ln y_n + (1-t_n) ln (1 - y_n) }

which gives both better numerical accuracy and enable us to write the error gradient in the same form as the gradient of the sum-of-squares error function for linear regression models (Bishop, 2006).

Another important detail is that the likelihood surface is convex, so it has no local maxima. Any local maxima will also be a global maxima, so one does not need to worry about getting trapped in a valley when walking down the error function.

Iterative Reweighted Least-Squares

The method of Iterative Reweighted Least-Squares is commonly used to find the maximum likelihood estimates of a generalized linear model. In most common applications, an iterative Newton-Raphson algorithm is used to calculate those maximum likelihood values for the parameters. This algorithm uses second order information, represented in the form of a Hessian matrix, to guide incremental coefficient changes. This is also the algorithm used in this implementation.

In the Accord.NET machine learning framework. the Iterative Reweighted Least-Squares is implemented in the IterativeReweightedLeastSquares class (source).

Source Code

Below is the main code segment for the logistic regression, performing one pass of the Iterative Reweighted Least-Squares algorithm.

Furthermore, as the likelihood function is convex, the Logistic Regression Analysis can perform regression without having to experiment different starting points. Below is the code to compute a logistic regression analysis. The algorithm iterates until the largest absolute parameter change between two iterations becomes less than a given limit, or the maximum number of iterations is reached.

Using the code

Code usage is very simple. To perform a Logistic Regression Analysis, simply create an object of the type LogisticRegressionAnalysis and pass the input and output data as constructor parameters. Optionally you can pass the variables names as well.

Then just call Compute() to compute the analysis.

After that, you can display information about the regression coefficients by binding the CoefficientCollection to a DataGridView.

Sample application

The sample application creates Logistic Regression Analyses by reading Excel workbooks. It can also perform standard Linear Regressions, although there aren’t many options available to specify linear models.

lr-1 lr-6

Example data set adapted from http://www.statsdirect.co.uk/help/regression_and_correlation/logi.htm.

 

lr-2 lr-3

The image on the left shows information about the analysis performed, such as coefficient values, standard errors, p-value for the Wald statistic, Odds ratio and 95% confidence intervals. It also reports the log-likelihood, deviance and likelihood-ratio chi-square test for the final model. The newest available version can also compute likelihood-ratio tests for each coefficient, although not shown in the image.

Final Considerations

The logistic regression model can be seen as the exact same as a one layer MLP with only one hidden neuron using a sigmoid activation function trained by a Newton’s method learning algorithm. Thus we can say the Logistic Regression is just a special case of Neural Networks. However, one possible (and maybe unique) advantage of logistic regression is that it allows simpler interpretation of its results in the form of odds ratios and statistical hypothesis testing. Statistical analysis using neural networks is also possible, but some may argue it is not as straightforward as using ordinary logistic regression.

Multilayer perceptron neural networks with sigmoid activation functions can be created using the Accord.Neuro package and namespace.

References

The Kernel Trick

K(x,y) = <φ(x),φ(y)>

Em aprendizado de máquina, o Kernel trick é um truque que parece ingênuo, mas que tem um poder quase inacreditável: transformar quaisquer algorítmos lineares que possam ser expressos em termos de produtos internos em algorítmos não-lineares.

A idéia chega a ser engraçada. Você tem uma técnica (como a PCA) que lhe permite trabalhar com funções lineares, e você tem alguma função arbitrária não linear que não segue este critério. Com o Kernel trick, você ainda pode fazer sua técnica funcionar. Tudo que você tem de fazer é incrementar o número de dimensões do espaço em que você está trabalhando. E incrementar muito. Mais especificamente, você pode simplesmente mover seu problema para um espaço em que exista uma dimensão independente para cada uma das possíveis entradas de sua função!

Video por Udi Aharoni demonstrando como pontos que não são linearmente separáveis em um espaço de duas dimensões podem quase sempre ser linearmente separados em espaços de maiores dimensões.

Assim que este mapeamento esteja feito, qualquer função poderá ser representada como uma operação linear, porque todas possíveis entradas serão completamente independentes (já que estarão localizadas cada uma em uma dimensão diferente)! Mas é claro, se sua função aceitar um intervalo contínuo de entradas, isto requerirá um espaço de dimensões infinitas, como um espaço de Hilbert, no qual será difícil trabalhar. Em muitas aplicações, como em PCA, tudo que você precisa é de um produto interno, que neste caso você pode computar no espaço original (de poucas dimensões). E computar este produto interno é o papel da função de Kernel.

Veja também: