FedGeoDay 2026: Four Talks Worth Your Attention

Summaries of selected talks from FedGeoDay 2026, Day 1, April 2026, US Census Bureau, Suitland, MD

Once again, I served on the FedGeoDay organizing committee this year. FedGeoDay continues to be one of the higher-value events on my calendar, and this year was no exception. With a focus on data preservation and federal data stewardship, backed by an information-rich program, four talks stood out to me in particular. From the opening keynote by Denice Ross to three well-constructed lightning talks that punched well above the format’s weight, they reinforced the value of federal geospatial datasets and highlighted how essential those assets are becoming to the AI industry. I will discuss each in this post.


Denice Ross: The Baker’s Dozen: Making the Case for Federal Geospatial Data

Denice Ross, former US Chief Data Scientist, now with the Federation of American Scientists and founder of EssentialData.US, gave the day’s opening keynote, and it was deliberately not aimed at the room. Her audience, she explained, was not data users. It was everyday Americans who have no idea these datasets exist or why they should care.

Her argument was that the geospatial community is sitting on one of the most compelling public interest stories in government, and it has done a remarkably poor job telling it. Not to policymakers or researchers. To regular people. The football coach. The mom of four. The small business owner who just survived a flood.

To make the case, she walked through a “baker’s dozen” of federal geospatial datasets, chosen specifically because each one has a story that lands outside of Washington.

A few highlights:

USGS North American Bat Monitoring Database. Bats provide roughly $53 billion per year in free pest control to American farmers by eating the insects that would otherwise destroy crops. This database tells you where the bats are, which speeds up permitting for wind farms, mining operations, and highway overpasses. Ross noted that Michael Lewis was reportedly interested in including bat data in his next book. That is the kind of story that travels.

NOAA NEXRAD. The real-time radar data that pilots use to avoid bird strikes is, ultimately, the data that helps prevent another plane going down in the Hudson River.

NOAA Argo Buoy Fleet. Autonomous shipping startups building captainless cargo vessels train their AI route-planning algorithms on data collected by this fleet of ocean-drifting buoys. Federal oceanographic data, it turns out, is part of the supply chain for the future of global trade.

DHS HIFLD Open. Ross described HIFLD Open as the “digital go bag” that the New Orleans GIS team used to plan for Super Bowls, Taylor Swift concerts, and hurricanes. These are all variations on the same problem: lots of people from out of town who do not know the local geography, while local officials need to know where the hospitals, hazards, and vulnerable populations are.

USFS National Fire Danger Rating System. The wooden sign at the entrance to the national park that tells you whether you can have a campfire tonight? That is a direct readout of a federal dataset.

Ross also highlighted OMB’s M-25-05 memo, which formally requires federal agencies to engage the public around the value of their data, not merely publish it and hope someone finds it. She called on the geospatial community to submit use cases to EssentialData.US, to think about who benefits from federal data rather than only who uses it, and to consider hosting one-hour use case workshops that turn data experts into data storytellers.

Her closing provocation was that we have been asking the wrong question. “What data do you need?” always produces an impossible wish list. The better question is what a modern society actually needs to know. From there, work backward to the data.

There are few people more knowledgeable or better equipped than Denice Ross to articulate the value of federal datasets in real terms. Her eye-popping valuation of the bat dataset alone illustrates the asymmetric value the nation receives from this kind of public data asset. Her talk deftly set the stage for the next two days.


Jerry Johnston: Earth Observation Foundation Models Are Here, and They’re Cheap to Use

Jerry Johnston leads the public sector geospatial technology practice at Deloitte, and he used his five minutes to do something genuinely useful: explain what Earth observation foundation models actually are and why the people in that room should care about them.

The core idea maps neatly onto something everyone already understands. GPT-4 is the foundation model that ChatGPT sits on top of. Earth observation foundation models work the same way, except that they encode satellite imagery instead of language. Take an enormous volume of multispectral imagery, compress it into dense vector embeddings that capture underlying patterns and variability, and you now have a searchable, analyzable representation of the Earth’s surface that you can query very cheaply.

The important word is cheaply. Training the model on tens or hundreds of millions of images is the expensive part, and that work has already been done. Bringing your own new imagery into the model, generating embeddings, and running analysis on top of them is relatively fast and low-cost. You are taking advantage of an investment someone else already made.

Johnston’s team has been working extensively with Clay, an open-source foundation model trained on more than 70 million satellite images from Sentinel-1, Sentinel-2, Planet Scope, NAIP, and Landsat. These are the workhorse datasets that the federal geospatial community has been building for decades. Clay is multimodal: it ingests imagery, place names, OpenStreetMap data, and other contextual inputs to generate richer embeddings.

Two use cases from Deloitte’s work stood out. First, they brought 13,000 new images into Clay and ran them in 28 minutes to detect change along the Syria-Jordan border. That is global coverage and near-real-time analysis at a fraction of the cost of traditional approaches. Second, they used Clay’s SAR (Synthetic Aperture Radar) embeddings for artisanal mining detection. Johnston’s team can now find subtle surface deformations and soil moisture signals, which are things invisible to the human eye, that indicate artisanal mining activity anywhere in the world.

The larger implication is that the federal geospatial community has spent decades building the datasets that are now the training corpus for these models. The return on that investment is arriving, but only for organizations willing to engage with the open-source tooling.

The industry has been talking about geospatial foundation models for a couple of years now, mostly ineffectively. The typical discussion rarely answers the abstract question of “What are they?” in a way that the casual user understands. If you cannot bring that home, “Why should I care?” is a lost cause. Jerry put that abstraction to bed in the tightest five minutes of the day.


Jason Gilman: The Context Window Problem and a Fix

Jason Gilman, director of AI applications at Element 84, gave the most practically useful engineering talk of the day. His subject was the specific way AI agents break when you point them at large federal datasets, and how to fix it.

The failure mode is simple and reproducible. You build an agent. You give it tools to query the FEMA disasters database, the Census, a hazards database, and a geocoder. You ask it something like: “Are low-income communities more affected by natural disasters than other communities?” The agent starts working. To answer the question, it tries to pull the entire FEMA disaster history into its context window. The context window overflows. The agent fails.

The obvious fix is to break the data sources into sub-agents, one per dataset, but that actually makes things worse. The FEMA sub-agent still has to handle a massive amount of data. The results still have to be joined at the orchestrating agent. You are now doing data joins in natural language, which is the wrong abstraction for the job. The system is also slower and harder to debug.

A better fix is to let the agent work with a coding environment, such as Python or Bash, where data manipulation, filtering, and joining happen outside the context window, using the right tools for the job. This gets most of the way there.

But Gilman’s main contribution was identifying what even this approach misses: ephemerality. When the conversation ends, the work disappears. Any derived datasets the agent created, any intermediate results, any code it wrote to produce those results: gone. Start a new conversation and none of it is accessible. This matters a lot in federally funded research contexts, where you need to be able to show your work and reproduce your results.

His solution is agentic data storage, a persistent, user- and project-scoped storage layer that lives outside any individual conversation. The agent can save derived datasets, including files, binary data, and tables, to this storage, access them in future conversations, and automatically generate metadata about how each artifact was produced. That includes which API calls were made, what code was run, and what data was used as input. The result is a complete provenance trail from raw source data to final output, reproducible end-to-end.

Gilman’s framing was useful: the difference between an agent that answers your question and an agent that does science is persistent, traceable storage.

Geospatial data is big and chews up context windows quickly. I am dealing with that in my own work at Clairvoyint AI. Jason is quietly one of the deepest experts in this area right now, and he calls out the fact that a lot of current strategies simply move the context problem around rather than solve it. His talk showed how traditional deterministic code and probabilistic LLMs can work together to make each other more effective.


Leo Thomas: LLM Agents as Geospatial Translators

Leo Thomas works with Development Seed, and he came at the LLM-agents-for-GeoAI question from a different angle than Gilman. Gilman focused on the engineering problem. Thomas focused on the access problem.

His starting observation is straightforward. There is an enormous gap between the potential value locked in geospatial data and the actual use being made of it. That gap exists because doing anything meaningful with geospatial data requires technical expertise: finding the data, accessing it, loading it, and analyzing it. Most of the people who would benefit from the data do not have that expertise. As an illustration, Thomas showed a simple Python script that searches a STAC catalog, loads one dataset, and computes a monthly average. It is 60 lines of code and requires four open-source libraries. A climatologist should not need to know any of that.

LLM agents, he argued, can bridge this gap by automating away the technical complexity and letting domain experts work in plain language. His team built a tool for the World Resources Institute that lets users query land carbon data from WRI’s Land & Carbon Lab, such as trends in US natural grasslands since 2020, using ordinary text. “That’s the way my grandpa writes Google queries,” Thomas said. “We’ve brought complex spatial analysis down to that level.”

His three best practices for building these agents:

Build on open standards. The geospatial community has spent years building interoperable open-source formats and APIs, including STAC, COG, OGC standards, and a rich ecosystem of Python libraries. These are, as Thomas put it, “boring and painful for humans to understand but really easy for machines to understand.” An LLM agent built on top of these standards can access a huge range of data sources without custom integration work for each one.

Make outputs traceable. The ReAct pattern, alternating between a reasoning node and a tool-call node, naturally produces a record of every step the agent took: what it was thinking, what tool it called, and what the tool returned. Surface that to your users and you have given them a full provenance trail for how the analysis was produced, which data was used, and what assumptions were made.

Let the agent apply guardrails. The same reasoning capacity that makes an LLM agent useful for analysis also makes it capable of checking its own work. Were there gaps in the data? Were there assumptions in the analysis that should be surfaced to the user? Building those checks into the agent’s reasoning loop produces outputs that are not just fast but defensible, which matters enormously when the results are going to inform policy decisions or federally funded research.

Thomas’s closing was that the value of geospatial data is only realized when someone actually uses it to answer a question. LLM agents lower that barrier dramatically.

Conclusion

For me, each of these talks contributed to a through-line about the value and utility of federal geospatial data assets in the AI era. AI is no different from other computing environments in one important respect: higher-quality source data yields better results. While these talks were developed independently and separately, they built on each other to make that point. Together, they point not only to the increasing maturity of geospatial AI, but also to how it can be used maturely. In the crowded field of excellence that was this year’s FedGeoDay program, these talks managed to stand out and speak to me.

Here is a link to the full event video: FedGeoDay 2026 Day 1