Data Over Software

One of the first tasks I ever had in my then-new GIS career was doing AML development in ARC/INFO 6.x for a data production project. My code parsed DXF exported from AutoCAD R11 for DOS and then assigned attributes based on things like layer, color, line weight, feature type, and others. It also georeferenced the data based on tic marks captured in AutoCAD. The end result was multiple ARC/INFO coverages that were fully populated from data templates based on the AutoCAD characteristics. From there, QA analysts tailored the data from defaults, if necessary.

After that, I did a lot of work in AML to build a cartographic production system for a water utility. That had me building a GUI using ARC/INFO forms and developing customized editing tools with ArcEdit in ArcPlot mode.

As you can imagine, I dug deeply into AML. I learned a lot about GIS – in which I had no formal training. Because AML essentially batched the same commands the analysts used at the command line, all of this development made me quite proficient with ARC/INFO. Those were fun times. Because I needed to learn GIS, this period had a lot of value for me.

As a software developer, however, there was a big drawback that is evident in the full name of AML – Arc Macro Language. All of the time and effort I was investing into building proficiency in AML was usable in exactly one place. The same was true when ArcView came along with its proprietary object-oriented language, Avenue.

Read more

Balancing Organizational Controls and Technical Controls in Data

Technical Controls – The security controls (i.e., safeguards or countermeasures) for an information system that are primarily implemented and executed by the information system through mechanisms contained in the hardware, software, or firmware components of the system.

Organizational Controls – The security controls (i.e., safeguards or countermeasures) for an information system that primarily are implemented and executed by people (as opposed to systems).

NIST-800

The definitions above come from the glossary of the NIST-800 series of cybersecurity publications. While they are focused on cybersecurity, the broader concepts – automated controls versus manual controls – are applicable elsewhere. Over the last couple of weeks, and especially since I attended the TUgis conference, I have been thinking about these concepts in terms of data in general and schema in particular.

I find schema to be an interesting concept. The term “schema” is fairly wide-ranging in its definition but it can be defined as “an underlying organizational pattern or structure; conceptual framework.”

Read more

TUgis Wrap-Up

Earlier this month, I attended TUgis, Maryland’s annual GIS conference. It was my first time attending since I gave the keynote address in 2017. That was due primarily to the conference being moved to early August – a reasonable adjustment due to the fact that the venue is always Towson University and the new timeframe takes advantage of the fact that students are still away on break. That timeframe also happens to usually coincide with my family’s annual vacation. The other reason for my long absence was the pandemic.

This year, the conference occurred right before our vacation, so I was able to squeeze it in – though I had to leave halfway through the second day to finish travel preparations. For me, the conference was a chance to catch up with a number of people I hadn’t seen in quite a while – all of whom I mentioned over on LinkedIn. I especially enjoyed catching up with a couple of my former Fulcrum co-workers whom I had worked with for my entire tenure there. Those were exceptionally meaningful years for me and I feel like we grew a lot together.

As for the conference itself, I attended the public safety special interest group and a few other sessions. As a recovering programmer, it’s always interesting to see the software solutions people develop – either from scratch or customizing some other software. At TUgis, that other software tends to be some form of Esri application, though there were a few mentions of open-source tools as well.

Read more

Stripe API Pagination in FME

I few weeks ago, I described an integration I built to pull data from the Stripe via its API and load it into BigQuery. There were two main problems with this approach: First, it was incredibly hacky – a Wile E. Coyote approach to the problem involving cron jobs and EC2 instances and GCS uploads and scheduled jobs in BigQuery. It got the job done, but in a way that was slightly embarrassing. Second, everything else we did was leveraging FME and this stood outside of that pattern.

I really wanted to bring this process into the FME fold, but I needed to tackle Stripe’s API pagination in order to do it. Most of Stripe’s helper libraries have an auto-pagination feature that allows you to keep loading objects until you reach the end. This feature is not available if you are using the raw Stripe REST API, which is essentially all that is available to FME via the HTTPCaller transformer.

Stripe’s API uses cursor-based pagination, meaning that it points to the ID of the next object to read in the dataset. Luckily, Safe Software has an article that describes how to handle cursor-based pagination. This article is that it uses the Slack API as an example. The Slack API is well designed and returns everything you need to handle pagination in one place in the response document, but this isn’t the case with Stripe’s API so I had to modify the approach somewhat.

Read more

Reflections, Twenty-One Years On

Yesterday was the 21st anniversary of 9/11. I tend to let that day go by without comment. My recollections of the day itself add nothing as I was 50 miles outside of DC at the time. Even that far away, the roads were filled with panicked people and the phone networks were crashing, but I wasn’t in the city and I have nothing to add about that day.

Twenty-one years ago today, I was driving back home with my family and, as we crossed the Harry Nice Bridge from Virginia back into Maryland, it was flanked on either side by armed boats from local law enforcement and the National Guard. At that time, I was a contractor supporting an infrastructure protection program for the Department of Defense. There was no clearer illustration of the importance of what we did than those boats on that day.

Read more

Data Is Hard

Where I work, we have developed a nuanced philosophy to describe the niceties of collecting data, managing it, validating it, and preparing it for use: “Data is hard.”

This was brought to light in a very public manner by the vandalism that was displayed on basemaps produced by Mapbox. The responses by Mapbox  and their CEO, Eric Gendersen, are good examples of how a company should respond to such incidents. Kudos to him and the team at Mapbox for addressing and rectifying the situation quickly.

The Gordian Knot
By jmerelo [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
Speculation quickly ran to vandalism of OSM, which is one of the primary data sources used by Mapbox in their products. That speculation was backed up by the edit history in the New York area, but it is interesting to note that the vandalism was caught early in OSM and never came to light is OSM itself. In this case, the crowd worked as it was supposed to.

Read more

Thoughts on HERE

When was the last time you bought a CD? Come to think of it, when was the last time you plugged an iPod into your computer and synced music from iTunes?

That’s what I thought.

The fact that HERE may be for sale (publicly, which is somewhat unusual in the world of acquisitions) and that it languishes is really no surprise. (“Reviewing strategic options” is a vaguebooking/subtweeting way of saying “Make us an offer.”) HERE is the CD of navigation. Many years ago, I supported a customer that did a lot of multi-modal transportation analysis. In the pre-OSM world, you had TIGER and a handful of commercial data providers. (Remember ETAK?) This was around the time that in-vehicle navigation was becoming commonplace in personal vehicles. The data in those systems, NavTech, was highly sought after but unavailable in standard GIS formats at the time. After a while, NavTech entered the GIS data realm, and its US product became the flagship commercial data set in the HSIP Gold database; a status it holds to this day. In some government circles, users clamored to get NavTech/Navteq/HERE data for their analysis. The rest of the world, however, has moved on.

Read more

DevOps for Geospatial Data

There has been a bit of buzz the past couple of weeks over the ability of GitHub to render GeoJSON and TopoJSON files automatically using and embedded Leaflet map and MapBoxtechnology. This buzz is quite justified as it presents an easy way to simply publish and visualize vector data sets. In the weeks since the initial announcement, the community has begun exploring the limits of GitHub’s capability. Probably the two biggest limiting factors are individual file size limits and API rate limits. Some, including myself, are exploring strategies for maximizing the ability to store, disseminate, and visualize data within these confines. For the near term, GitHub will probably not be the place to store terabytes of data or act as the CDN for a high-volume mapping application. That is perfectly fine and there is still a great deal of value to be found within GitHub’s current generous constraints.

Read more