Data Preservation: Let the Pain Guide You

As I’ve recounted before, what became HIFLD started as the M: drive on a Windows server in a musty government building in Norfolk, VA. Early exercises made it obvious that the data on our M: drive didn’t match the data on other M: drives. They also made it clear that sharing data, especially across 2002-vintage government networks was painful. (Picture shapefiles sent as email attachments to multiple messages to keep under the 5MB attachment size limit.)

So a couple of us stood up an ArcIMS server to share some of the data. Partner organizations could see, in what passed back then for real time, what we had and were able to give us feedback. Some of them stood up their own servers so we could do the same. IT policies prevented feature streaming, but feedback loops were shortened noticeably.

Then we had to sort out how to get state and local partners to be able to participate. Then we had to sort out how to license commercial data sets across the community. Each win exposed a new shortcoming and, with each shortcoming, the “we” involved in finding a solution expanded.

We briefly ran down the path of attempting to standardize data schemas. That was as bad an idea as you are imagining. HIFLD soon became a working group that met regularly. Attendance grew each time along with the interest. Feedback began to be collected and processed in a more structured manner.

Less-than-ideal intermediary steps were adopted while bigger issues were solved. Some longtime HIFLD users will recall the quarterly shipments of DVDs on which data was distributed while NGA took the lead on trying to make the GII. Eventually, the community was able to sunset that practice. At every step of the way, imperfect solutions were shipped to trigger feedback which drove iterative, incremental improvements.

I am reminded of this after an evening discussion about ongoing data preservation efforts. Many federal data sets and sites have gone dark. Well-intentioned individuals and organizations scraped what they could get before everything went offline. A lot of valuable data is sitting in the 2025 equivalent of a bunch of M: drives and the community is trying to figure out how to reconcile it and get it online. There’s talk of cloud-native this and STAC that and AI whatever. These are all good things, but here’s my advice:

Just ship it.

If you have data you have preserved, put it in an S3 bucket and let people get at it. Do something deeply unsexy like standing up a GeoServer instance. Get it all out there in the open and let interested parties see it, touch it, and begin figuring out what is where.

It will suck for a while, but it won’t suck nearly as much as the data continuing to be dark while a solution is engineered. The solution can be engineered while the data is made available in profoundly imperfect ways. Once the community can see it, the feedback will start. The number of “what-if” discussions will drop to nearly zero and we’ll be able to move forward.

In 1965, Bruce Tuckman outlined a four-stage model for group development (Forming, Storming, Norming, Performing). All four stages are necessary. The data preservation community is currently in the forming stage and it is essential that it passes through the storming stage in order to mature. In this case, storming involves getting the scraped data out into the world so that real, honest feedback and requirements can be surfaced. There is no shortcut and no benefit to waiting.

In the case of HIFLD, the sudden, clarifying event was a terrorist attack that forced us to think and collaborate in new ways. Today, that event is the implementation of astoundingly regressive public policies that have us thinking about how we can stand on our own without sponsorship from where it has always come. That is the real question and formats and infrastructure are only small pieces of the answer.

Embrace the suck, put the data out there, and let’s get going.

Header image: Rubbish computer, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Share this:

Published by Bill Dollins