Ontario Schools COVID-19
(Written December 2021)
(Written December 2021)
Steps:
1. Create free account with Elastic Cloud (note to self, I need to finish this within a couple weeks before my trial ends)
2. Find data...searching...searching
3. In Elastic, create "index patterns" that will be used for maps and graphs that will be aggregated into a dashboard
Code not hosted on GitHub (this was done with no coding, using Elastic)
Some basics first...
At one of my previous employment, the company used Elastic and Kibana for providing interactive analysis on operational runtimes for batch jobs. I had thought it was cool to be able to view metrics across a time horizon, picking out anomalies, drilling down for detail, and being able to gain actionable insights.
Historically these tools were primarily used for searching and visualizing from logs but those boundaries have been pushed into so many other useful use cases.
Data, data, data
The Ontario government records and publishes data on a daily basis for the number of COVID cases in schools. I used the data from the period of September 2021 to December 2021.
I wanted to map the cases and in order to do so, ended up having to join the data with other sources containing the addresses of the schools.
A quirk with the data was that it reports as total COVID cases per day per school. I wasn't able to validate the increase in cases day over day. The same case could be reported multiple days and I wouldn't know if it was the exact same case or whether it was dropped, only to be replaced by a new case.
This meant that I could only report on the total per day or have an idea to the general magnitude. Not a true depiction of daily increase in cases nor a true representation across a time continuum.
Elasticsearch
Using a free trial version, I was able to quickly load the data and Elasticsearch being a document oriented database, I needed to create an index which I could search on later using index patterns. It was interesting to see concepts of NoSQL, graph-oriented database, pipelining, ETL, etc. all being possibilities with the ELK stack. I didn't even touch Logstash but through my research, could envision the power and purpose of using it to setup a pipeline for ingestion, orchestrating transformations, and the usage and availability of plugins for further extensibility.
Kibana
Now to the fun part...once I had the data, I was able to quickly map out the cases and visualize on an interactive heatmap that can be drilled down.
(apologies in advance, you will have to click the play button to play the video)
...or one with a sliding time scale to visualize week over week
By utilizing various graph templates, I was able to start gleaning some insights:
Note the increases for Barrie and Sudbury
Note the decrease in Dufferin-Peel Catholic and the increase in York
Interest in postal codes in the Toronto area, starting with letter 'M', seeing whether trend indicates cases are recent, occurred earlier, or have been consistent throughout
Looking at specific school and the cases over time, able to see whether cases are happening more recently, earlier in the year, or consistent throughout
After creating disparate charts and graphs, Kibana allows you to easily pull it together into an interactive dashboard
I didn't get too crazy with trying to source supplementary data to that provided by Ontario government for COVID cases in schools. I did thoroughly enjoy dabbling with Elastic/Kibana as a non programmatic way to quickly tap into data to produce some meaningful and interactive maps and charts.
To summarize some of the trends observed:
Municipals: Barrie, Sudbury
York Region District School Board
Municipals: Brampton, Mississauga
Dufferin-Peel Catholic District School
There could be potential for some interesting analysis to correlate number of COVID cases in accordance to specific events e.g. when school board changes policy to switch from in person to online or vice versa. Even with the fore-mentioned observation indicating a down trend in cases during mid/late October, it would be interesting to know what the cause / correlation could have been.
But for now, I'm content to have dabbled with Elastic/Kibana and realize that from a technical perspective, I could have explored much more in order to setup a proper pipeline should I wish to expand on this project.
From the perspective of a parent and viewer of the data, its daunting to see the numbers ever increasing. As at December 24, 2021 Ontario is reporting 1097 out of 4844 schools with cases (~23% of schools); cumulative total of 12,062 cases.