Pages

Thursday, July 31, 2014

k-means clustering algorithm

k-means clustering is popular method for partitioning a set of observations into k clusters such that each observation belongs to cluster with the nearest mean. It is an unsupervised method.

Algorithm:
The most common k-means algorithm repeatedly does these steps until convergence
Assignment step: Determine the centroid coordinates. Assign each observation to the centroid whose means yields the least within-cluster sum of squares.
Update step: Recalculate the new means to be the centroids of the observations assigned to the clusters.

I used iris data set from UCI machine learning repository to implement k-means algorithm in R. Here is the code in R.



The cluster plot is shown below:

Monday, July 21, 2014

Why R?

I started working on R recently. The thing I like most about R is it is open source. R has many contributed packages for various domains. I am still exploring the capability and limitations of R in Machine learning and data mining domain. R comes with lots of statistical and machine learning tools. I ran k-means clustering algorithm on a sample dataset, the script was short and easier to code. I have to see how the k-means algorithm in R will scale for larger dataset.