I’m a CIO. That means a typical day can involve a range of activities: responding to data calls for security audits, reviewing SOC2 reports of prospective platform vendors, managing subscriptions of existing vendors, handling GDPR requests, ensuring data from corporate systems is meaningful and relevant for our internal stakeholders, attending meetings of various sorts.
Somewhere down in the belly of all that, I occasionally get to write some SQL in our data warehouse platform, BigQuery and maybe publish a report in a BI tool. There’s almost always a need for some spatial analysis, so the spatial functions in BigQuery come in handy. For example, I recently wrote a custom function to geolocate IP addresses as they appear in our data in order to do localized sales tax estimates.
The astute reader will note that there are some terms missing from the above description of my day: GIS, PostGIS, ArcGIS, QGIS, and so on. It’s rare that I crack open any traditional GIS tool anymore, unless I am working a personal project.
My company is primarily standardized on Google Workspace. That said, I also pay for a sizable footprint of Office 365. The primary driver of that additional expense is Excel. Those who do real financial work in my company have always insisted on it and I couldn’t pry it from their cold, dead fingers.
For the past year, I have been working on an MBA program and I am finishing up a financial analysis course as I write this. It’s helped me understand why our finance team is so insistent on Excel as I have run into a few built-in functions that have no real equivalent in Google Sheets. (Note: Google tends to quietly sneak new functions into Sheets over time, so it’s always worth checking back in with it.) But, during this MBA program, I’ve gotten back in touch with just how powerful Excel in particular, and spreadsheets in general, can be for data analysis.
As a person who has made my living with databases for my whole career, I have had a tendency to look down on spreadsheets. The ugly un-normalized data, the random column headers, the dynamic schemata, the multiple data sets in one sheet – the list goes on. The one thing I cannot argue with is the accessibility of spreadsheets. You open it up and start adding data. The rich function libraries enable value-added analysis right away. If “low-code” is the hotness right now, then every platform calling itself low-code has a lot to learn from Excel and its brethren.
Spreadsheets also embody the ethos of convention over configuration. There’s no rule that I’m aware of that says pasting a formula elsewhere in a sheet should automatically update the target cell ranges, but all spreadsheets behave this way. Because I know Excel, I can be instantly productive in Google Sheets or LibreOffice Calc.
So I’ve rediscovered a respect for spreadsheets. That’s cool. Good for me.
Circling back to GIS, though, I’ve come to realize that spreadsheets are the vast, unconquered frontier for GIS. For every person working in a database or Python or RStudio, there’s probably 10 to 100 others doing analysis in a spreadsheet. Having some support for spatial types and functions in spreadsheets would greatly expand the democratization of spatial analysis that has pervaded databases, programming, notebooks, and business intelligence platforms. So what could that look like? I see something like this:
=COUNTIF(A1:A15, ST_CONTAINS(Sheet2!A1:A200))
A1:A15 of the current sheet could contain WKT strings of points and A1:A200 of Sheet2 could contain WKT strings of polygons. Sure, WKT is ugly, but spreadsheet users have demonstrated an astounding ability to tease accuracy and productivity out of ugly data. Maybe it’s time to set our purity tests aside and get out of their way.
The core of GIS is spatial analysis. At its most basic, that means determining the relationships between geometries and taking action based on those relationships and that can, and often does, occur independently of maps. Traditional desktop GIS tools have a tendency to tightly couple spatial analysis and maps. Tools like PostGIS, Shapely, Jupyter Notebooks, and RStudio have broken that coupling in a practical way. With approximately 750 million users for Excel alone, spreadsheets are the best next place to extend this trend.