Watching COVID-19 Data for Your County with PostgreSQL and Node

I have addressed the topic of triggered notifications a couple of times on this blog previously. I’ve taken the opportunity to apply the technique to a current use case – the ability to get notifications whenever the confirmed count of COVID-19 cases changes in my county or surrounding ones.

I am basing this workflow on the Johns Hopkins University county-level data available as an ArcGIS feature service here: https://services1.arcgis.com/0MSEUqKaxRlEPj5g/arcgis/rest/services/ncov_cases_US/FeatureServer

Using the “f=geojson” parameter, it is possible to download the data in a format (GeoJSON) that is readily consumable by OGR. As a result, I was able to initiate a core workflow using the following steps.

1. Do an initial data pull

Using OGR, I did an initial pull to create the primary table in my PostgreSQL database. This will be the table that gets watched for changes. That was done with the following command-line statement.

https://gist.github.com/geobabbler/de208c0df4fe7c7873d1bd77dd8311b8

You’ll need to replace $host, $user, $database, and $pw with your hostname, username, database name, and password respectively to load your data into PostgreSQL/PostGIS.

2. Add a “watched” boolean column

Next, in PostgreSQL, I added a boolean column to the table so that I could flag which counties I wanted to track.

https://gist.github.com/geobabbler/58fdea1741894de42fe4e447076d4c80

The value for all records defaults to false, so I needed to toggle the counties I wanted to watch to true. In my case, those were three counties in Southern Maryland.

3. Create a trigger function

In many database platforms, the trigger object is a function/procedure that is executed directly on the specified event. In PostgreSQL, the trigger and its function are separate objects. So, first, I had to create the trigger function.

https://gist.github.com/geobabbler/98d8c15cdea6f526d2c9c59d9ac3e567

This does nothing more than roll up the event data into a simple GeoJSON feature and broadcast it using pg_notify onto a channel called ‘jhucounty’. Discussed here, NOTIFY/LISTEN is a pub/sub architecture that’s built into PostgreSQL that enables external systems to be notified of events inside the database in a safe manner that shouldn’t block the database or create performance issues. (YMMV)

4. Create the trigger

This simply attaches a trigger to the table. Notice that the WHEN clause ensures the trigger function fires only if the updated record is watched, using the column added in step 2.

https://gist.github.com/geobabbler/094cf163062645d1e2afb14360be2340

5. Update the data and notify

This step is a little trickier than it may appear. In the JHU feature service, each update touches every record, even if there is no update. The value of the “Last_Update” element is always the same for each record. This means there is no reliable way to use OGR to filter the data on import, so I have to download the full data set each time.

The approach I took was to download the data with OGR into a temporary table, then use a SQL JOIN to update the records in the primary table which had updated data. This was most effective done using a bash script.

https://gist.github.com/geobabbler/293717732cd3486b56ae9416eed57202

The database, username, hostname, and password are all passed into the script as arguments, in that order.

6. Listen and do something.

I wrote a simple Node app to listen to the ‘jhucounty’ channel and lateral the data to someplace else. In order to keep it simple for this post, it just dumps it to the console.

https://gist.github.com/geobabbler/96088b494309550058163ff156e7dd44

Where to go from here

Most of the remaining work is setting up the node app to do something more intelligent with the output. I plan to update it to send an email and, possibly, a text message, so that I can get the updates as they happen (or nearly).

I’ll also set up the bash script in a cron job, so that it will run periodically. When I do that, I will most likely move all of the database parameters to environment variables.

I’ll update as I make these changes.

UPDATE:

I made two updates to automate this process on a schedule. I’ll pick up the numbering scheme above.

7. Send an email

I updated the node script above to use the “nodemailer” library to send an email about updates. I used GMail to send the updates. If you choose to do this, you’ll need to configure the GMail account to allow SMTP connections.

https://gist.github.com/geobabbler/1abcc59b8d713399885362f87cd0db67

8. Set up a cron job

I use Linux, so this step is specific to that. If you use Windows, you’ll need to set something up via the task Scheduler. I haven’t used Windows full-time in almost three years, so I won’t attempt to show that process here. This step uses the bash script shown in Step 5 above.

Open a terminal window and type:

crontab -e

This will allow me add a line to crontab to run the script. Because I didn’t use “sudo,” the cron job will run under my identity, rather than root. Add the following line and save.

*/15 * * * * /path/to/script_from_step_5.sh database user host password >> /path/to/some_log_file.log

If you’re not familiar with crontab syntax, that runs the script every 15 minutes. You can see that this example passes in the database parameters as arguments. It’s probably best to move those to environment variables or some other method that doesn’t require hardcoding in crontab.

Conclusion

The process describes above was initiated to watch changes to COVID-19 data, but it is really applicable to any data that changes frequently. I hope you find this approach useful.