Window Functions and PostGIS

FOSS4G North America was an opportunity for me to reconnect with both community and technology. I enjoyed being able to both learn new things and refresh my skills with technologies such as PostGIS. I was reflecting on how, a couple weeks prior to the conference, I introduced the concept of PostgreSQL window functions to a junior team member in a different context. I have used them quite a bit in both PostgreSQL and BigQuery and FOSS4G got me thinking about their applicability in a spatial context. I have previously discussed them in terms of specific use cases, but they are a powerful tool that warrants an overview.

PostgreSQL window functions can perform calculations across a set of table rows related to the current row. When combined with PostGIS spatial queries, they can enable additional levels of spatial analysis. This post examines how to use PostgreSQL window functions with PostGIS spatial queries to gain deeper insights into geospatial data.

Understanding Window Functions

Window functions allow you to create a ‘window’ of related data over which you can perform calculations while still returning individual rows. Unlike aggregate functions that group rows into a single output row, window functions retain the granularity of the original dataset.

Common window functions include:

ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition of a result set.
RANK(): Assigns a rank to each row within a partition of a result set, with gaps in the ranking for tied rows.
LEAD(), LAG(): Provide access to rows that precede or succeed the current row in order to perform calculations based on values in those rows.
SUM(), AVG(), MIN(), MAX(): Perform calculations across sets of rows that are somehow related to the current row.

Spatial Window Functions with PostGIS

To apply window functions in a spatial context, consider the following general steps:

Partition the Data: Spatial queries often involve partitioning data by location or by some spatial attribute. For instance, you might partition data by geographical boundaries such as counties or neighborhoods.
Ordering: Within each partition, you can order the data based on a spatial attribute, such as distance from a certain point.
Frame Specification: Define the subset of the current partition to be used for calculations.

Here are some simple examples of how to use window functions with PostGIS. They are quite useful and I encourage you to dig deeper into the documentation and add them to your toolkit.

Example 1: Cumulative Distance Calculation

Suppose we have a table public_paths that stores a series of connected line segments. We want to calculate the cumulative distance for each segment from the starting point.

SELECT
  id,
  ST_Length(geom) AS segment_length,
  SUM(ST_Length(geom)) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_distance
FROM
  public_paths;

In this query, the window function SUM() calculates the total length from the start to each segment sequentially by ordering the rows by their `id`.

Example 2: Ranking Points of Interest by Proximity

Imagine a table points_of_interest with a column `geom` holding point geometries. We want to rank these points based on their proximity to a reference location.

WITH ref_location AS (SELECT ST_SetSRID(ST_MakePoint(-73.985656, 40.748817), 4326) AS geom)
SELECT
  p.id,
  ST_Distance(p.geom, r.geom) AS distance_from_ref,
  RANK() OVER (ORDER BY ST_Distance(p.geom, r.geom)) AS proximity_rank
FROM
  points_of_interest p, ref_location r;

The window function RANK() orders points of interest by their distance from a given reference location.

Example 3: Analyzing Sequential Events

Consider a vehicle_movements table that records the positions of buses as they travel their routes. You might want to analyze stop durations by comparing the arrival and departure times at each stop.

SELECT
  vehicle_id,
  stop_id,
  arrival_time,
  LEAD(arrival_time, 1) OVER (PARTITION BY vehicle_id, stop_id ORDER BY arrival_time) - arrival_time AS stop_duration
FROM
  vehicle_movements;

The LEAD() window function is used here to look at the following row to calculate the duration of each stop.

Example 4: Neighborhood Analysis

If you’re analyzing demographic data per neighborhood, you might want to calculate running totals or averages within a city’s neighborhoods.

SELECT
  neighborhood,
  population,
  AVG(population) OVER (PARTITION BY neighborhood) AS avg_neighborhood_population
FROM
  city_demographics;

This query calculates the average population per neighborhood using the AVG() window function partitioned by neighborhood.

Example 5: Path Interpolation

For applications like tracking wildlife movement, you might need to interpolate points along a path. Window functions can order these observations and then use spatial functions to interpolate points between them.

SELECT
  id,
  ST_LineInterpolatePoint(
geom,
1.0 / COUNT(*) OVER (PARTITION BY track_id ORDER BY timestamp)
  ) AS interpolated_point
FROM
  animal_tracks;

In this case, ST_LineInterpolatePoint() generates interpolated points along the movement paths of animals.

Performance Considerations

While window functions are powerful, they can be computationally intensive, especially when applied to large geospatial datasets. You should only perform window functions on the data you need. Using subqueries or common table expressions (CTEs) can narrow down your dataset prior to analysis.