Development Environment

All of the tools we use in my group are open source and can be downloaded and used for free. If you have the time, consider joining an open source development community to give back to the amazing array of tools that are available. It makes the data science community better and is excellent experience when it comes time to apply for a job.

Operating System

My team tends to use either the mac OSX or linux operating systems. These are far more stable for scientific computing than windows (IMHO). If you’re using linux, Ubuntu is the best choice simply because it’s so common and most software packages are tested on it. Note: Be sure you’re using a native installation and not running Linux inside a virtual machine. A virtual machine installation will not give you the performance or stability that you will need.

Development Environment

The first step in installing all of the libraries and tools mentioned here is to install Miniconda on your computer. This will install python and the pip installer that is necessary for installing the other packages.

Once miniconda is installed, you should install:

jupyter lab
pandas
numpy
sci-kit learn
matplotlib

In all cases, google the installation procedure for each of these libraries/tools and use the pip method when available.

Github

All code developed by my research group is managed using GitHub. This is good software development practice and allows members of our team to collaborate on a common code base. Prospective software develoment and data science employers will often ask potential candidates if they have a GitHub portfolio of the code they worked on during their studies.

You will need to know how to:

Clone a repository to create a local copy on your computer.
Create a branch to work in.
Commit changes to your branch. Note: Commit changes only to mark development milestones. Do not use daily commits as a way to “save your work”.
Potentially merge changes from the master branch into your branch.

A good resource is the Pro Git book. Concentrate on Chapters 1-3 and the relevant commands from appendix A3. The reference guide is also handy.

You will need to install:

The git command line tools.
The nbdime diff/merge tools for Jupyter notebooks.
The gitk branch visualization tool.

Latex

An important part of research is writing about your results. For your thesis and all technical papers, you will be using latex. I write latex directly in a text editor and compile it on the command line. However, most students prefer Overleaf and the Overleaf website also has some good tutorials. For drawing diagrams, Inkscape is a good choice.

Be sure to also check out my page on effective technical writing.