Carving Up GIS StackExchange

So the data server that I was working with was having issues and, while waiting for the IT staff to resolve things, I decided to browse the GIS StackExchange site. It’s been online for quite sometime now, though I am only an occasional user. One thing led to another and I found myself playing with the StackExchange API, which returns various information for a particular site in a JSON format.

One of the things that got people excited about the site was that it was neutral territory, not directly controlled by a particular vendor or organization and it remains one of the few places where you can see hyperbole-free discussion of Esri tools right next to that of open-source tools and general GIS concepts. Since it’s been online for a while now, it has a lot of posts covering many different topics. I had some time on my hands, so I decided to pull the information for the 100 most “popular” tags, using the StackExchange API. With JSON in hand, I parsed into a CSV with Python and loaded the data into Excel to take a look.

Since tags are user-generated, they run the gamut. To provide a little bit of standardization, I applied some categories to group the tags. I completely made up the categories but tried to apply them consistently. Not all fell neatly into groups so I made a few judgement calls. For example, I threw tags about the Esri Flex API into “Esri Tools” whereas “Flex” went into “Development/Programming.” Some may quibble but that’s how I did it. Also, the raw numbers indicate the frequency of each tag. Since posts can have multiple tags, there can be a lot of overlap. For example, an individual post could be about ArcSDE (“Esri Tools”) for PostgreSQL (“General Database”) using the PostGIS (“Open-Source Geospatial Tools”) geometry type. So a single post could span a number of categories. The chart below shows how the top 100 tags fell into those categories.

I’m not sure that it provides any particular insights into the geospatial community at this point. A true analyst statistician data scientist can probably poke numerous holes in it. It would probably be good to revisit this in a year to see how things have changed, if at all. What it does say to me is that the GIS StackExchange site has succeeded in being a fairly neutral forum for discussion of GIS tools and concepts and that its user community has a diverse set of interests. Such a forum was sorely lacking before it came online, so that’s a win.

For those with a mind for such things, the top tag was “arcgis” with 1659 occurrences and the 100th tag was “export” with 58. The StackExchange API lets you get a max of 100 tags at a time so I stopped at 100 for two reasons: 1) There was a significant dropoff from 1 to 100 and I didn’t think the next 100 would add much and 2) I had to assign categories by hand and 100 was as much as I had patience for. The data was downloaded on 13 September 2012 and will quickly become stale. The data that I used can be downloaded here.

It looks like my server is online again, so back to work…