/ TOOLS

The Toolbox

This is the very first post on here and what better way to begin than to take a peek at the tools we’ll be using to generate much of the content on x.dat.

At the outset, it’s best to mention that this is not an exhaustive list; just something to get started with. That said, tools will be the subject of many a future post, no doubt.

Analysis

Both languages below come with their own rich set of libraries for data analysis and modeling. This won’t be yet another site to expound on the pros and cons of one over the other. A search for ‘python vs r for data science’ should bring up enough material for anyone still on the fence. I predominantly use Python but prefer R for its superb plotting library ‘ggplot’. Luckily enough, R libraries are eminently callable from within Python.

Visualization

We have many choices here and more so depending on how user-friendly and interactive one wants to make the visualizations, not to mention how much work one is willing to put into generating them. Another important consideration has to do with whether the tools used are closed or open-source. We will only be looking at open-source options here. Note that proprietary tools often have an open-source version available but with limited functionality. One such tool that is widely used is Tableau.

We’ll also look at matplotlib and ggplot, plotting libraries from Python and R that are powerful enough to cover many a use-case.

Development Environments (IDE)

Development environments are part of the everyday life of a developer/data-scientist. They are also, often, tools that one ends up with rather than consciously choose. Nevertheless, a good IDE goes a long way in making the coding experience more of a “walk in the park” than a “day at the dentist’s”.

Anaconda is a very widely used distribution for both Python and R, although I primarily use it for Python. It comes with all the necessary libraries considered essential for data science and can be used to install and manage multiple versions of these libraries through the conda package manager. Technically, Spyder is the IDE, while Anaconda is the distribution. I almost never use Spyder thanks to Jupyter, the web-based app that allows one to create highly reproducible work in the form of notebooks.

RStudio is an IDE for R. It is the equivalent of Spyder (rather, it should be the other way around). For someone who doesn’t use Spyder, I almost exclusively use RStudio only occasionally preferring the shell that comes with a native R install.

And lastly, although they are not full-blown IDEs I often find myself using Sublime Text and Atom for both coding up my Python projects as well as authoring Markdown documents, just like the one you are reading!