This page provides data sets and a series of exercises related to the techniques I use in my research group. This page assumes that you have cloned a local copy of my public data analysis repository. References to files and directories will be relative to the root directory of this repository.
Much of my data work involves analyzing and visualizing the timelines of individuals who interact with health services and/or emergency social supports. This is very sensitive data and can only be accessed under security and privacy protocols that are approved by the University of Calgary ethics review boards. In other words, it is not possible to share it here.
In order to have a dataset that we can use here to learn about data analytics, I have “disguised” a public domain dataset to appear as though it represents individuals who are interacting with mental health, police and emergency shelter services. Each individual is linked with one or more timestamped events of the following type:
Broadly speaking, we want to use these timestamped events to identify individuals at risk of having their first adverse outcome or a repeat adverse outcome.
If you are interesting in finding out who the individuals actually are in our data set, click here.
Before looking at these notebooks, be sure you’ve read the relevant background material.
demo/Exporatory Data Analysis.ipynb introduces some EDA techniques and also demonstrates some of the properties of our data set.
demo/Survival Analysis.ipynb demonstrates survival functions and Cox proportional hazards linear regression using scikit-learn libraries.