Belatedly, TUgis 2017 Keynote Text

In March of 2017, I gave the keynote address at TUgis, Maryland’s geospatial conference. Despite a few requests, I’ve been remiss about posting the text until now. The following lengthy block quote is the address as I had written it. It differs slightly from what I actually said, but I can no longer recall the on-the-fly changes I made. The original text is pretty close, though. It was written before some of the announcements by Microsoft and Amazon later in the year, but that simply serves to illustrate the pace with which trends discussed here are moving.

Good morning and thank you. I’m Bill Dollins, Vice President of Engineering at Spatial Networks. Until recently, I was a partner at Zekiah Technologies in La Plata, Maryland.

Thanks to Ardys and the organizing committee for inviting me to speak today. It’s a great honor to address so many geospatial practitioners who are working to meet the challenges of our communities and our state. I should own up to the fact that this talk almost didn’t happen. The email invitation from Ardys got sorted into my spam folder and I only saw it because I went looking for another email I was expecting. More than once, I’ve imagined the awkwardness had I not responded and simply shown up today as an attendee.

As a guy who generally sticks to 5-minute lightning talks, I was a little intimidated at the idea of giving a keynote and I mentioned this to a good friend of mine who has done a variety of speaking engagements. He reminded me that I’ve been blogging about the geospatial industry for 10 years and that I might find a topic or two in there. In fact, his exact words were “I’ve stolen a few things from your blog. I think it’s okay if you do, too.”

At his suggestion, I did something I never done before: re-read my own blog from beginning to end. The experience was a bit like binge-watching an old network television series. Themes that were more subtle over weekly airings practically jump out of the screen when you watch them back to back to back. What I saw was one man’s view of the recent evolution and democratization of GIS technology. I’ll explore that more in a bit, but first I’d like to share a story.

There was time, early in my career, when I almost left the GIS field. Like many that I know, I came to GIS almost by accident. My background was actually programming, which I had done since I was very young. When I finally landed my first job out of college, I was introduced to GIS. In that job, I performed heads-up raster-to-vector data conversion of maps of Army bases.

After a few months, I was being sent to Danvers, Massachusetts for ARC/INFO training. Before leaving, I was talking with someone I considered a mentor. He had a background in IT and was the CIO of a company that owned a nationwide chain of hotels. The operation he oversaw was the very definition of “enterprise,” but he had no knowledge of GIS. So I explained it to him. Before long, he zeroed in on this ARC/INFO software I was going to learn. This was in 1994.

When I was done, he thought about it for a few moments and then declared he could immediately think of several ways in which geography and location would be useful in helping him understand his business operations, but he wouldn’t invest in GIS in a million years. Needless to say, I was shocked. He explained that, in order to make use of it, he’d need to hire specialized staff, purchase workstations that did not meet his current architecture, introduce data sources that were incompatible with his in-place databases, and develop specialized processes for shipping data back and forth between the GIS operation and the rest of his organization. He summed it up by saying the insights he would gain simply could not justify the cost.

At this point, a career in GIS was not a foregone conclusion for me. That conversation really got me thinking. I was working with stovepiped data and proprietary scripting languages. I worried that staying there too long would limit my career. But off to Danvers I went. In that week of training, I worked with people from the Army Corps and small town governments who were genuinely excited about what they were going to be able to do with this software when they got home. That excitement helped and I decided to stick it out a little while longer.

I think we can all recognize the accuracy of that assessment of GIS at that point in time. GIS progressed pretty much unchanged for another decade or so, focused on its traditional user base and locked into its traditional business model. We began to see signs of change when Google Maps came online and challenged the previous assumptions of how maps could be delivered to users.

Since then, many other segments of the information industry caught on to the value of location and location analytics. And like my mentor, those segments didn’t seem to have much use for GIS.

The web led the way. Google Maps was the first widely-known implementation of pre-rendered map tiles, which was an approach that ran completely counter to how map servers of all stripes behaved at the time. It also gave us a new map projection that, while controversial, demonstrably simplified the delivery of geospatial content to the only desktop application anyone wanted to use: the web browser.

The web’s design and delivery ethos demanded targeted information and focused, intuitive applications. This meant providing only the geospatial logic required, which, in turn, meant breaking up traditionally-monolithic geospatial software into smaller components. A few years into this transformation, my friend Brian Timoney began a series of posts on his blog, Mapbrief, titled “Why Map Portals Don’t Work.” It remains a must-read in the current age of fascination with open data portals.

The real problem with GIS wasn’t that it was not web-ready, although it wasn’t; it was that GIS was not scale-ready. The web, which was designed to deliver information at scale, was merely the canary in the coalmine. Let’s examine some numbers:

In an academic paper published in 1997, Michael Lesk estimated that all of the recorded information in the world to that point, if digitized, would amount to about 12,000 petabytes. This estimate included all text, photos, sound recordings, broadcast media, telephony.

There are two important things to note in that observation. The first is the year. It’s reasonable to imagine that most of the information recorded up to that year was analog. The second is the word “recorded.” In the analog age, the recording of information was laborious and time-consuming. As a result, the act of recording information was itself a significant filter on that information. For example, think of analog photography versus digital photography. Today, people take dozens of selfies in one evening and many can be found on their phones years later. When you only had 24 exposures on a roll of film, you tended to be more judicious in choosing your shots. So it’s reasonable to assume that the 12,000 petabytes cited is a fraction of the information generated through human history to that point.

In 2010, Eric Schmidt, then CEO of Google, estimated that we generated as much information every two days as we had in all of human history up to 2003. It’s worth noting that his estimate occurred before many of today’s social media platforms existed and before those that did exist, such as Facebook and Twitter, had reached the level of saturation we see today.

So, every two days or so, the world generates 12,000 petabytes of new information, though more recent estimates place that number much higher. Thanks to the ease of digital technology and the wide-spread availability of storage, much of it is recorded easily and automatically. I will refrain from reciting any famous, but apocryphal, statistics about how much of that information is spatial. I think we will all agree that there is a lot of spatial content in there.

This is, of course, big data. It is not just a buzzword, but very real. These days it comes from many sources: digital print media and feeds, streaming video and audio, photos, sensors, drones, internet connected devices. In our own field LIDAR and imagery are increasingly and more frequently available. And, of course, social media platforms account for huge amounts of the information produced today.

Data of this size and velocity can present numerous challenges. It has also consistently proven every assumption wrong. At the outset, many responded by suggesting big data wasn’t useful and needed to be shrunken down to be analyzed meaningfully. This response is somewhat understandable as it is an attempt to get this new environment to fit into existing paradigms. By insisting that data is less useful when it is big, we try to reduce it down to a size that works with our existing tools, including our favorite GIS packages.

But data at scale provides many opportunities. By having all of the data available, we can get more accurate models of the behavior conditions we wish to understand.

Performing real-time sentiment analysis of geo-located tweets during a natural disaster can provide instant insight into the need for response. Going beyond that, overlaying those tweets with open data sets such as coverage areas from the National Broadband Map can help identify areas where data and communications may be down, and more in need of physical inspection.

Thanks to big data, predictive analytics has become the new modeling and simulation. We can reduce the need for assumptions and equivalences used in traditional modeling and apply trend analysis to predict outcomes based on more complete historical data. Human-centric crowdsourcing efforts such as OpenStreetMap have clearly demonstrated that large-scale collaborative efforts can result in reliable, accurate, authoritative data. That fact is now without question on the whole.

But we are already moving beyond even this model. The “three V’s” of big data – volume, variety, and velocity – mean that our data is now essentially crowdsourcing itself. Validation is derived from overwhelming statistical correlation nearly without human intervention.

A good example of this concept, that happens to have a spatial component, is vehicle navigation. It has moved from essentially static routing based on attributes like speed limits and turn restrictions through dynamic route generation based on manual crowd-sourcing such as Waze, to real-time re-routing based on current traffic conditions. The final state is enabled by information gathered from our smartphones sitting in our cars as we move, or as we don’t, and renders traditional actions such as jurisdictions manually reporting road closures through a government web site obsolete.

This type of dynamic analysis is made possible precisely because we have data at scale available to us, not because we have attempted to cull it down.

To achieve this kind of analysis more consistently, we must overcome the biggest challenge of data at scale: gravity. This has been the biggest problem for traditional GIS tools. In a traditional workflow, we prep our data and load it into a tool such as ArcGIS Pro or QGIS, perform our analysis, and generate new data or analytical product. There are still plenty of use cases for this approach among historical GIS user communities, but data at scale defies it.

Like a large object in space, data at scale has a gravitational pull, described by Dave McCrory in 2010, making it impractical to try to bring it to our tools. Instead, our tools must move to it. We have already seen this in the data science community, where tools such as R contain sophisticated geospatial analysis capability. Other tools, such as the open-source Turf Javascript library, enable asynchronous geospatial analysis of high-velocity data as it moves through a system. Such libraries are modular, allowing us to embed only those tools we need, without additional dependencies adding unneeded overhead to the system.

With analytics being performed in place, traditional desktop tools are left to perform static cartographic and visualization functions. But these functions, too, are being performed more often in targeted environments, such as Mapbox Studio, or even vector graphics editors such as Inkscape.

In this paradigm, our geospatial tools are being broken apart and distributed across the new problem set of dealing with data at scale. In order to meet these new challenges, GIS must become more diffuse and embed itself into traditional and emerging information architectures.

Under way for at least a decade and accelerating, this democratization of GIS tools is both inevitable and necessary. Location is too valuable to go unexploited in the volumes of data we now generate.

By now, you may have noticed the images looping on the screen. The map images interspersed through them were produced using an array of tools that aren’t typically the subject of conversation in many GIS shops:

Elastic, nlextract, RStudio, plasio, greyhound, potree, WebGL, Cesium, shapely, Fiona, Pix4D, GIMP, glTF, and many others.

For the past two years, I’ve had the pleasure of curating the GeoHipster calendar and many of the map images you see came from calendar submissions. I’ve been impressed and somewhat humbled by the imagination of design, content, and subject matter evident in those maps. Which brings me to my last topic.

As strong as the process of change has been, there has been one clear constant…and I am looking at it now. Our GIS tools have not changed themselves. They have been changed by geospatial practitioners like those in this room.

Many significant changes in technology were initiated by people working in fields outside geography. Like my mentor years ago, they understood the importance of location but had very little need for GIS as they found it. But that’s not to say that geography and geographers haven’t had a few things to say.

The web has come through its ‘web mercator’ years to understand the value of projections, as seen in the support for a wide range of projections in D3. If you’re going to display range rings for a North Korean missile test on your news site, you need to understand the significance of azimuthal and stereographic projections. And, although the news media may not know it, post-election discussion of the electoral college and the popular vote is really a discussion of the modifiable areal unit problem.

While GIS tools are becoming more diffuse and GIS is becoming less distinct as a technological entity, the geographic knowledge base becomes more important. We, as geospatial professionals, are not defined by the tools we use.

Let me say that again: We, as geospatial professionals, are not defined by the tools we use.

It’s actually quite the opposite. GIS tools are merely a concrete representation of the knowledge of our community. And while we can encapsulate our knowledge in tools and automate spatial analysis and embed it all in traditional information systems, we cannot yet automate our understanding of the appropriate use of our knowledge, and we cannot yet automate the innovative application of geography to new problem sets. That remains firmly the domain of us.

Change, done correctly, is always a little uncomfortable. We get a little nervous when we attempt to perform familiar tasks with unfamiliar tools. Will I be able to produce a result? Will it be valid? Do I know Python well enough? I think we’ve all asked ourselves those questions a few times.

This is an exciting time to be in the field of GIS. As the technology landscape changes rapidly, the people of GIS provide the stability and leadership to make sense of it all. There has never been a better time to know how to apply geographic knowledge.

That knowledge combined with accessible geospatial tools enables innovation. It enables a small shop like Hobu, Inc., whose work is also featured in the scrolling images, to develop software systems and data management services for large scale point cloud data, providing point cloud classification and analysis in the browser. It enables a startup like CityZenith to fuse GIS, BIM, and gaming engines for full in-browser analysis and visualization. And, finally, it enables much of the innovation you will see today.

Whether it’s NextGen 911 or redistricting or distribution of open data, Maryland is tackling a lot of vexing issues related to geography and geospatial technologies. But, just as we are not defined by our tools, neither are we limited by them.

We are limited only by our imaginations, and our willingness to collaborate with each other, and our ability to pick the right tool for the job at hand, and our willingness to share our solutions widely.

Thank you.