Visualizing the impact of Urbanization in East Asia

Winning entry in the Visualizing.org & Worldbank Urbanization challenge

Posted: April 2, 2015-Likes: 0-Comments: 3-Categories: D3.js, Data Visualization, Geospatial-Tags: D3.js, Geospatial, R
You are here: ...
Home / Blog / D3.js / Visualizing the impact of Urbanization in East Asia

I’m still walking around with a big smile on my face. Extremely happy that I actually won the Visualizing.org & Worldbank challenge on Urbanization! In my post below I’ll explain a bit about the design process, my frustrations and share some snippets of code

Not too long after finishing a piece for my very first data visualization challenge (for the Food Poisoning challenge from the Kanter Information is Beautiful Awards, see my entry here) I came across another challenge on Visualizing.org about Urbanization in East Asia, organized together with the World Bank.

Usually what works really well for me is that somebody tells me a subject or points out an interesting insight and then leaves me completely free to create something around it. I find it more difficult to come up with ideas on the spot when I have complete freedom, also about the subject, than to come up with ideas within the boundaries of a specific subject. Without any boundaries I suddenly feel like I’m drowning in an ocean I guess. Thus another challenge was a really good framework for a next project for me

Data

Extensive research was performed by the Worldbank to create a map of the built-up areas and population distributions across East Asia for the years 2000 and 2010 by using detailed satellite images. The dataset included information on the 869 urban areas that had an urban population over 100,000 in 2010

The World Bank had already written an extensive report on the results in this data “The World Bank’s report East Asia’s Changing Urban Landscape – Measuring Spatial Growth 2000-2010“. From the challenge website I felt the goal was “to create a visualization of the data to convince local policy makers to invest more on the study of urbanization”, at least that is what I understood from these two sentences

“… projects will be used in reports to policymakers to advance the study of urbanization in East Asia”

“We encourage you to strongly consider how your visualization can be useful to policymakers”

So I read the report to understand the data and results already published and thought about the design, and came up with absolutely nothing…

My problem was the data. There were many maps available that you could already explore using the PUMA exploration tool. I still have little experience with maps (although I love the ease, beauty and flexibility of CartoDB, this is just so intuitive that it doesn’t count). I was thinking that many other entries would probably feature maps, so I wanted to try and not make the main focus of my design a map

The PUMA Exploration tool

The data also included two small Excel files. One about each country as a whole and one about the 869 cities. The city file contained only three different dimensions in essence, the type of city (contained within county border, spillover or fragmented), the urban land in square kilometers and the urban population, both for 2000 and 2010. Dividing urban population by land gives you urban population density. The description clearly states that these urban areas are defined differently than the “usual” cities. Any information on the city Tokyo in the World Bank database is not the same as the urban area of Tokyo, which is defined by the satellite images. Therefore, adding external data on a city level was, I feel, not really an option. So, how to convince policy makers to invest more money when you can use data about urban population and land increase?

Design

At Deloitte our team also has a small group of visualization enthusiasts who come together every few weeks to create products to help promote the use of visualization within the company. I asked them for help with the design. During a team diner/brainstorm I suddenly had the idea to combine storytelling (or narrative) with interactivity instead of trying to visualize all the data at once. To show different aspects of the same data that would all give their unique insights, point the audience to several interesting insights using storytelling (which simultaneously acted as explanations) and then become and interactive “view”

Introduction

To introduce the audience to the data we thought to show specifically that 200 million people had joined the urban areas between 2000 and 2010. At first I wanted to visualize this as a force clustering where all cities from 2000 would become bubbles sized to their population and cluster together and do the same for 2010. However, when that finally sort of worked I was again faced with the ‘curse’ of circles, or rather the comparison of areas; the difference between the two groups just didn’t show all that well (I forgot to take a screenshot of this result so the sketch on the left below is just to give you the idea that I was hoping for, but in reality the two groups seemed almost the same size)

Therefore I went back to the basics and created two bar charts that grow simultaneously, but where the red 2010 bar grows even higher after waiting for a fraction of a second at the height of the 2000 grey bar

Sketch of the 1st introduction idea (exaggerated)

Final result of the introduction

The cities

Although I didn’t want a map to be the main focus, I felt that by introducing the cities as locations on a simple map would be the most intuitive way to start

I manually selected the 25 countries from the report out of a World Map GeoJSON and played around with scaling and translation so it would fit nicely on the left of the screen. For the city locations I used a small script that I’d written some time ago in R to find the latitude and longitude of each (with the Google Maps API)

In three steps all cities with their population size in 2000 and 2010 are “grown”, starting with the smallest cities and ending with the 8 megacities with more than 10 million people

Introduction of small cities

Introduction of small cities

Introduction of medium cities

Introduction of medium cities

Introduction of big cities

Introduction of big cities

Now that all cities were introduced I could start with putting focus on a specific detail. I chose to show the biggest city in 2010, the Pearl River Delta, to point out the enormous increase this city has seen between 2000 and 2010.
In some of my first versions I then went on the show a small city that had grown more than twice as big. However, after finishing that piece I felt that the narrative was taking too long overall, so I took it out.

After the Pearl River Delta the viewer finally arrives at the first real “view” where each city or country can be investigated with the hover functionality

First view: a Map

Absolute differences

The second view I wanted to work towards was something showing absolute differences. With only two points in time available this quickly became a slope graph. Again, the introduction to this second “view” takes about a minute in which I explain what the audience is looking. Simultaneously I point out one interesting insight that the data provides when viewed in a slopegraph (how practically all cities in Indonesia saw a rise in their population density, which was already high in 2000 to start with)

The slopegraph itself is in essence a scatterplot where all cities in 2000 have the same x location and all cities in 2010 also have the same x location and they both share the same y axis scale. The piece of code below gives the basics of creating a slopegraph when the city circles are already initiated



//Basic steps to create a slopegraph

//Location of the 2000 and 2010 vertical "lines"
var xAxisLocation2000 = 200,
    xAxisLocation2010 = xAxisLocation2000 + 200;
 
//Create a scale that is the same for both the 2000 and 2010 circles
var slopeScale = d3.scale.linear().range([620,120]);
//Create the domain, I know that the absolute min is the min of 2000 and the absolute max is the max of 2010
slopeScale.domain([d3.min(data, function(d) {return d.population_2000;}),
		   d3.max(data, function(d) {return d.population_2010;})]);

//Move city circles of 2000 to left
cities2000.selectAll(".city_2000")
	.transition()
	.attr("r", 3)
	.attr("cx", xAxisLocation2000)
	.attr("cy", function(d){return slopeScale(d.population_2000);});

//Move city circles of 2010 to right
cities2010.selectAll(".city_2010")
	.transition()
	.attr("r", 3)
	.attr("cx", xAxisLocation2010)
	.attr("cy", function(d){return slopeScale(d.population_2010);});

//Initiate the lines for the slopegraph
slopes.selectAll(".slopes")
	.data(data)
	.enter().append("line")
		.attr("class", "slopes")
		.style("stroke-width", 1)
		.attr("stroke", "#858483")
		.style("opacity", 0);
		.attr("x1", xAxisLocation2000)
		.attr("x2", xAxisLocation2010)
		.attr("y1",function(d){return slopeScale(d.population_2000);})
		.attr("y2",function(d){return slopeScale(d.population_2010);})
		.transition().delay(1000) //Only show the slopes once the cities are in the right location
		.style("opacity", 0.4);

I still do not really like the fact that there are too many cities in the bottom half, making it look like a big grey blur. But in the interactive section of the slopegraph, I saw that combined with the country hover functionality it was better to leave all cities in there. It gave enough insights into each country when you could fade out all of those cities in China

Second view: a slopegraph

Relative differences

Absolute differences put the focus on the big outliers, but relative differences (or the growth/decline between 2000 and 2010) can give a whole other set of insights. To keep the link with the previous two views I wanted to keep the cities as circles and build a histogram from there. At first I tried to create a graph similar to the “Tax rates across the US example” by the NY times, so I could have an extra dimension to give to the sizes of the circles.

I tried to figure out how the NY times calculated the positions, but these were hard coded in the data. I was able to find something on stackoverflow that sort of creates a similar plot and I used this to perform some preliminary tests on the data (so please don’t look at the the styling)

The test result for population growth

The test to create a histogram without a set base

The actual shape of the population growth Histogram

I had already created the actual histogram of population growth in R (the black and white plot) and once I compared this to my test I saw that the NY times example was not the right plot to use for the insights that I wanted to show. Due to the big cities several locations along the x-axis were artificially lifted up (or down) which did not necessarily have anything to do with there being a lot of cities around that growth percentage.

However, even if the test was far from perfect, I did like the appeal it had. Therefore I made all the cities the same size so it became a Streamgraph like histogram, by which I mean that there was no base line. But again, I felt that the “boring” histogram was a better representation of the data so on the end I created a standard histogram with the twist that there are no bars but that each city is visible as a circle

Again, this is actually a scatterplot in which the x-axis is defined by the growth of each city in the usual sense, but where the y position depends on the number of other cities with a growth percentage which is about the same, i.e. fall into the same growth “bin”.
I have created something similar for my “Top 2000 songs” project which I could use as the base for this. It’s a bit too much code to put in this blog (although you can find it all on my Github) but these are the steps:

  • You have to loop through all of your cities before plotting the data
  • Find out in which “bin” the growth percentage of a cities belongs
  • Check how many other cities have already been mapped to that bin, which actually gives the the “height” of that particular bin
  • Set the y location to be: the height of the growth percentage bin + 1 (use a d3.scale.linear() for the y scale and test which range you have to set to make sure the circles are no longer overlapping)
  • After all cities have been given a y-position they can be plotted in the regular sense

I added a sorting before the first step so the bigger cities lie at the bottom and you can get a bit of a stacked area chart idea

Testing color schemes & orientations

This view is actually the only one with a legend (as usual, it took quite some time to get to the current color scheme). I have purposely tried to keep legends to a minimum, but used other methods. For example, I tried to convey that 2000 was light grey and 2010 was red by using the bar charts at the start and text colors.
For the map I felt that it wasn’t important to have a legend for circle sizes. You could hover over each city to see that actual numbers, for me it was the relative differences that mattered

Country view

I tried several others ideas with the cities that didn’t make it into the final result. For example, I wanted to use force clustering to group cities into their countries where you could switch between countries or type of city (contained, spillover, fragmented) or other dimensions/measures. Similar to the example described here. I had it working but there are just so many cities in China and several countries with only one city that it didn’t look right. But even more importantly, the narrative shouldn’t last that much longer, it already took 2 – 3 minutes to get to the point of the histogram view

Therefore, I moved on to the finishing graph which we had already discussed during the brainstorm (we had the start and finish worked out during the brainstorm, the middle part remained a bit fuzzy and was built in phases). We wanted to end on a somewhat positive note, not just showing that urbanization was something that couldn’t be ignored, but that it also offered opportunities for countries to rise from poverty. The original report already showed this opportunity in a scatter/connected dot plot of GDP versus urban population. We couldn’t think of a different chart that would get this message across better than the GDP-Urban population plot, so we reused it (I had to look up some data from the World Bank, but adding external data on a country level was fine)

Step 1 was to convert the bar chart into a connected dot plot showing the growth in Urban population, bringing emphasis to the fact that practically all countries can expect ongoing urban growth since not even 50% of their population is living in a city. In step 2 the GDP dimension is added to turn it into a connected scatterplot with the message that there exists a correlation between a rising urban population and rising GDP

I played around a bit with the transformation of the bar chart to the connected to plot. I wasn’t 100% satisfied with the transformation from the right of the screen to the left of the screen and increasing in size. However, if I just faded out the bar chart and faded in the dot plot, the connection that these were the countries we’d already been looking at for a few minutes was lessened. Therefore, I tried to make it look as good as I could and leave the transition.

I do like how the first horizontal one dimensional connected dots turn into the scatter/connected dot plot when the second dimension of GDP is added. This turned out better than I expected and was a rather simple thing to do

Step 1. Show the growth in urbanization along 1 axis

Step 2. Add the second dimension of GDP

Afterthoughts

What actually took me a lot more time than expected are the things the viewer is not aware of; that the transitions between stages ran smoothly regardless of how fast the viewer clicked through the stages or went from one view to another and back.
This difficulty is actually the reason why I did not implement the option to flip to every possible step in the narrative, only the four interactive “views”. The user can still use the “back” and “continue” buttons to quickly jump ahead or back from one of these views though.

Also, the visualization is not responsive, I fixed it to about 1000 x 800 pixels. This was a deliberate choice because I knew making it responsive  would make things too complex for my skills and available time. However, I think this functionality would be a really good improvement.

After the first 30 hours I just stopped counting and I have to admit that in the end it had cost me way more free time than I had wanted/intended to invest in the first place. However, I didn’t want to abandon it and throw away the work already put into the project. It wasn’t that I didn’t enjoy building it, it was more that I had to put practically everything else aside and spend every moment available after work on this in the final 2-3 weeks before the deadline

Well, this has become quite a long post, but maybe that’s only fair for my biggest (or maybe just longest) project to date

Actually winning

And then March 31st came, the date on which the winner would be announced. I have to admit that I had checked the site 3 or 4 times that day already, hoping to see an update. Then around 6pm the page had some difficult loading, I think I caught it right in between their posting of the blog, but suddenly there was a post with an image of my entry. Extremely happy! Suddenly it was truly worth all the hours, haha.
A big thanks to my colleagues who helped me point out improvements in my design and especially to Marlieke Ranzijn who helped turn my initial dull pieces of text into something worth to read!

BTW, I find the entry by Titz & Sux really nice and very innovative and it got a well deserved Honorable mention (I very much like their legend which explains how to read it)

Like I said at the start, I’m still walking around with a big smile around my face, even the storm with its winds, rain and overall dreary greyness raging outside my window right now can’t spoil my mood for some time

I’ve even started on a data visualization project at work, about government data, for 2 days a week (which will be published in May) so now my hope of doing (more advanced) data visualization as an actual job is getting closer!

Next up will be the addition of a page listing my favorite visualization resources & my most useful blogs, books & tutorials that I am using to become a self-taught data visualization designer. Hopefully this will be of use to some of you as well

Prev / Next Post
Comments (3)

Leave a Reply