5 votes
1707 reads

Regridding Data: How to choose the best method?

It's quite common in many applications that one needs to regrid a set of data (either changing the grid spacing or just changing the origin). There are several methods to do the interpolation between the two grid, the nearest neighbor being the simplest one and others include bilinear, spline, and inverse distance interpolation.

4 votes
1131 reads

Super Useful Wiki Page on Advanced R Programming

Hadley Wickham is working on a book on "Advanced R Programming" that will be expected to be published in Chapman and Hall's R series early this year. " The book is designed primarily for R users who want to improve their programming skills and understanding of the language.

4 votes
1138 reads

Cherry Picking and Global Warming

The recent extreme cold event across North America has once again opened up discussion on the impacts of global warming in media and many online blogs. Some scientists are claiming that due to the Arctic warming (result of global warming) the westerly winds in upper atmosphere, or as is called Jet Stream, is weakened and caused the unusual waviness in the stream. This waviness brings the cold Arctic air southward and causes the frigid air and record wind chills over many parts of the North America especially continental US.

3 votes
1247 reads

Relationships between Probability Distributions

Probably the most well-known relationship between two probability distributions is that random variable Y has log-normal distribution if log(Y) is normally distributed. In fact, there are many of these inter-connections between different probability distributions as shown in the following figure: 

3 votes
1513 reads

How to build a machine learning document classification system from scratch using R

Timothy DAuria shows how to build a machine learning document classification system from scratch in less than 30 minutes using R. He uses a text mining approach to identify the speaker of unmarked presidential campaign speeches. Other applications of this work are in brand management, auditing, fraud detection, electronic medical records, and so on.

2 votes
2845 reads

Speeding up Computations in MATLAB using GPU

The Graphics Processing Units (GPU) are being used more and more nowadays for speeding up computations as a mean for parallel programming. These units were initially designed to provide fast and smooth graphics on the computers; however, during the recent years they have been used as a tool for parallel programming. The benefit of using GPUs with respect to CPUs is that a regular computer might have 4 CPU cores but around a 100 GPU cores. 

2 votes
1694 reads

Statistics Done Wrong

“Statistics Done Wrong” is an interesting guide provided by Alex Reinhart to the most common statistical errors committed in science. 

2 votes
807 reads

Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

Edwin Chen has a brief and yet complete introduction on nonparametric Bayes and in specific on the Dirichlet Process. He explains the concept of Dirichlet process as well as three different representations of DP: Chinese Restaurent Process, Polya Urn Model, and Stick-Breaking Process. Finally, he has a quite interesting example of using Dirichlet Process Gaussian Mixture model to cluster different items in McDonald's menu. You can find the article here:

4 votes
1238 reads

Modern Bayesian Nonparametrics - NIPS 2011

An interesting talk on "Modern Bayesian Nonparametrics" by P. Orbanz and Y.W. Teh.

1 vote
868 reads

Bayesian nonparametrics in document and language modeling

A nice talk on "Bayesian nonparametrics in document and language modeling" by Yee Whye Teh. It starts with a brief introduction on Dirichlet Processes and Hirarchical Dirichlet Processes and it continues by using hierarchical dirichlet processes in document and language modeling.