Learn, Imagine, Build
Geoff Messier's Projects & Ideas
The purpose of this section is to give students who are brand new to our group something to look at to get up to speed. Here, I’m trying to strike a balance between giving you the fundamentals that everyone should know and not spending too much time exploring techniques that you might not use in your specific project.
All our development work is done in python. For an introduction to the language, I like The Quick Python Book which can be read for free through the University of Calgary library. However, there are an uncountable number of good Python tutorials and introductions on the web.
Our development environment is Jupyter notebooks which are a mechanism for combining python code, plots and notes created using the Markdown language. Think of them as a very super-powered replacement for a log book that are very popular in the data science community. I prefer to run the Jupyter “lab” interface rather than the “notebook” interface since it does a better job of also handling plain python files.
We also make extensive use of the following libraries:
The scikit learn machine learning library is the standard for class (non-deep) machine learning models. The tutorials are useful but the user guide is more focused. If you’re new to machine learning, go through sections 1.1.1, 1.1.11, 1.10, 1.11.1, 1.11.2, 1.17.1, 3 and 10.
The Pandas library is designed to manipulate data in table form. Start with the getting started pandas documentation (in particular “10 minutes to pandas”). The full users guide is more useful as a reference when you encounter specific problems during your software development work.
Most of our machine learning problems are imbalanced (there are many more negative cases than positive cases). The imbalanced learn library contains several algorithms for handling imbalanced problems. If you’re new, read sections 1, 2.1 and 7 to get started.
We also make use of the NumPy, matplotlib and SciPy libraries. It’s not necessary to become proficient with these right away. Just scan the introductory material and be aware of them as a resource for when you start your research work.