Note: The application described in this post is running here. It requires Silverlight 4.
I was perusing my LinkedIn connections and noticed that quite a few had PMP certifications. I also noticed that most of those who did seemed to be in the Washington, DC area. Of course, given that I live in that region, my sample could be a bit skewed but then I started thinking out loud (via Twitter):
I would love to see a heat map showing concentrations of PMPs. I bet the DC area would be white-hot. I suspect others not so much.
Naturally, I could not let this sit. How hard could it be? It turns out it wasn’t that hard so I decided to throw a small app together to look at the data. In the process of working out an approach, I decided to also look at GISP certifications because the data set is smaller and is available as one download from the GISCI. Here’s a blow-by-blow:
Preparing the data
Both the GISCI and PMI maintain registries of current certification holders. In the case of the GISCI, the registry is available as a single download, if needed, so I grabbed the current list of certified GISPs (as of January 26, 2011). I only cared about location and date (which I’ll make use of in the future) so I stripped out all of the names. Then I deleted all of the non-US records (to keep my geocoding easy). That left me with city, state and date. To geocode the data, I uploaded it to GeoCommons. I have blogged about this capability previously but this was a larger data set (and only required matching cities). The processing took about 20 seconds and only missed about 75 records out of about about 4500 records. The majority of the missed locations were APO/FPO addresses so I just removed those records. Once that was done, I left the data up on GeoCommons where it’s still available.
For the PMP data the basic process was the same but getting the data was a little trickier. The PMI doesn’t offer a single download so I had to use their query tool to query US records my the first letter of the last name. The query tool returns a maximum of 1000 records so I ended up with about 24,000 records (“Q”, “X” and “Z” don’t have 1000 records). I pieced this together in Excel, stripped out the names and ended up with the same information I had for the GISP data. In this case, the data set is not complete but I felt like it probably was still representative of the geographic dispersion of certification holders so I went forward.
The PMP data set was too large to upload and geocode on GeoCommons (which caps geocoding at 5000 records). Fortunately, Kate and FortiusOne was gracious enough to geocode it for me (I wanted to use the same geocoder for both data sets). After a little cleanup, I posted that one to GeoCommons as well.
Mapping the data
I wanted to use a heat map to show the data (for no particular reason). There are numerous ways to accomplish this but I chose to use the ESRI ArcGIS API for Silverlight. I made this choice primarily because a lot of my project work right now involves it so the tools are at my fingertips. Also, it has a HeatMapLayer class which I had not had a chance to try yet.
So I set up a simple mapping application and, at runtime, I load the data from GeoCommons and populate each heat map layer with the points from the appropriate data set. In this case I loaded the data as CSV (which is one of the data options on GeoCommons) by making the appropriate call to the GeoCommons API. I chose CSV because I only needed the latitude and longitude of each location and CSV was actually the least wordy transmission mode. I parsed the data using the KBCsv parser available on CodePlex. I could have also loaded the data as GeoJSON and used JSON.Net or Vish’s outstanding GeoJSON.Net library but CSV suited my needs in this case. Here is a screen shot showing the GISP data.
Reading the data
So, did the data confirm the suspicion from my original tweet? Yes. The ESRI API calculates heat map intensity based upon the data that is visible in the current map extent. When you zoom out to the whole nation, you can clearly see that the highest concentration of both certifications, by far, is in the Washington, DC area. If you zoom in more and pan around, the mapping adjusts itself so you can get a better representation of other areas by zooming in to them (and away from DC). You can also adjust the display with the intensity tool to bring out some local variations.
What doesn’t come across well is relative scale. There are a lot more PMPs than GISPs but that doesn’t show well in this presentation. That’s why I chose to put a toggle on the map because there’s very little value in seeing the two data sets overlaid.
All told, it took about two hours to get the core of the application in place and maybe about another hour to tidy things up. With GeoCommons, it literally took longer to prep the data than to geocode it and the native heat mapping capability of the ESRI Silverlight API was extremely easy to use. The Silverlight implementation shuts out most mobile users and anyone on Linux (Moonlight is not yet Silverlight 4 compatible) so, for those reasons, I’ll continue to explore more standards-based approaches but this combination of tools certainly helped me answer my question quickly.
This application is currently being served up live at demo.zekiah.com/heatmap
Odd. I expected to see a GISP black hole somewhere in the middle of Massachusetts.
Whew, glad to hear the Houston PMPs are not on their way to invade San Antonio 🙂
LOL!
I sent a message to FortiusOne. It looks like other cities were located correctly so I guess they’ll want to know what’s special about Houston. 🙂
When I zoom in on the area between Houston and San Antonio, I see a hot spot in Fayette county, but hardly anything over Houston. Maybe there’s a data issue?
Yep. It looks like the Houston records were geocoded too far to the west. I’ll try to correct that. Thanks!
This is so rad. It also begs (again) the question: Why are there so few GISPs in Redlands?
Thanks. I also noticed that there don’t seem to be any in Fort Collins. I was surprised by that.
A testament that bureaucracy thrives on certifications while business ignores them?
Let’s just say the concentration in the DC area didn’t surprise me in the least.