Where I work, we have developed a nuanced philosophy to describe the niceties of collecting data, managing it, validating it, and preparing it for use: “Data is hard.”
This was brought to light in a very public manner by the vandalism that was displayed on basemaps produced by Mapbox. The responses by Mapbox and their CEO, Eric Gendersen, are good examples of how a company should respond to such incidents. Kudos to him and the team at Mapbox for addressing and rectifying the situation quickly.
Speculation quickly ran to vandalism of OSM, which is one of the primary data sources used by Mapbox in their products. That speculation was backed up by the edit history in the New York area, but it is interesting to note that the vandalism was caught early in OSM and never came to light is OSM itself. In this case, the crowd worked as it was supposed to.
When was the last time you bought a CD? Come to think of it, when was the last time you plugged an iPod into your computer and synced music from iTunes?
That’s what I thought.
The fact that HERE may be for sale (publicly, which is somewhat unusual in the world of acquisitions) and that it languishes is really no surprise. (“Reviewing strategic options” is a vaguebooking/subtweeting way of saying “Make us an offer.”) HERE is the CD of navigation. Many years ago, I supported a customer that did a lot of multi-modal transportation analysis. In the pre-OSM world, you had TIGER and a handful of commercial data providers. (Remember ETAK?) This was around the time that in-vehicle navigation was becoming commonplace in personal vehicles. The data in those systems, NavTech, was highly sought after but unavailable in standard GIS formats at the time. After a while, NavTech entered the GIS data realm, and its US product became the flagship commercial data set in the HSIP Gold database; a status it holds to this day. In some government circles, users clamored to get NavTech/Navteq/HERE data for their analysis. The rest of the world, however, has moved on.
It’s great news that the government shutdown is finally over. Many of our colleagues across the geospatial industry can now report back to work, ending a another stressful period for them. During the shutdown, many stepped up to try and fill the gap left by shuttered government web sites that would normally distribute geospatial data.
There has been a bit of buzz the past couple of weeks over the ability of GitHub to render GeoJSON and TopoJSON files automatically using and embedded Leaflet map and MapBoxtechnology. This buzz is quite justified as it presents an easy way to simply publish and visualize vector data sets. In the weeks since the initial announcement, the community has begun exploring the limits of GitHub’s capability. Probably the two biggest limiting factors are individual file size limits and API rate limits. Some, including myself, are exploring strategies for maximizing the ability to store, disseminate, and visualize data within these confines. For the near term, GitHub will probably not be the place to store terabytes of data or act as the CDN for a high-volume mapping application. That is perfectly fine and there is still a great deal of value to be found within GitHub’s current generous constraints.