Posted on July 22, 2013

On the creation of a “self organizing map” program in R

I’ve been using self organizing maps (or SOM) to analyse client data for more than a year now. In the beginning I tried some commercial software, but I did not like the fact that it was too easy to just randomly click some buttons and a map showed up. I wanted to know what was happening under the hood.

I wanted to learn R anyway, so I thought creating my own SOM program in R would be a good goal to start doing this. I used the som function in the Kohonen package as a base, but in the next month or two I added a lot of functionality around it:

Each variable that used in the SOM gets its own mini heatmap, plotted in a 3x3 grid

In terms of the SOM algorithm the biggest change is the parallel code I added in the C coded section using OpenMP, which took much longer to achieve than I had in mind…

Visualization

However, I was not at all satisfied with the plotting of the SOM; mini line charts, wind roses or circles in a grid ಥ_ಥ I couldn’t really work with this once the number of variables used to cluster was bigger than about 5.

I therefore started making my own plotting functions, using hexagons. Each node is plotted as a separate hexagon (using polygons) and colored depending on its value between the minimum and maximum of that particular variable. It began small, but the visualization part kept on growing. I added functions to:

A pop-up window creates an interactive experience to fully investigate the SOM heatmaps

This enabled me to make it a lot easier to answer question such as What sets the chosen segment or group of nodes apart from the total map? Why are these nodes added into one segment, what connects them? They can now make a selection (multiple clusters to singular nodes) and see the underlying values of either the nodes or the data itself and write this to a csv.

Since the program was first functional about a year ago I’ve been able to use the R program on several client projects. It is now being used by several of my colleagues, both from my office and some offices abroad, which, I have to admit, I find pretty neat!

An extra window can show the statistics of the selected segment or group of nodes to see how it differs from the rest of the map

Using it during projects brought me tons of new ideas to make the program even more intuitive. A short selection of the things I still want to add are:

But of course, I am usually working on other projects which severely limits any spare time I have to work on my R SOM program. Most of what I have done so far was done in my spare time. But Coursera has been taking up a lot of that time recently, haha, too many interesting courses. Perhaps I’ll get the chance on my next SOM related client project.

EDIT I created a new post which shares a small piece of code from the program. How to create hexagonal heatmaps in R. I hope you will find it useful.

See also