Articles

5 helpful resources to learn data science

Here are some helpful resources to learn data science:

  • 80,000 Hours is a website that explains evidence-based ways to have a positive impact on the world through your working life. The site features an excellent high-level career review for data science, including some reasons why working as a data scientist in industry may provide some of the satisfaction of academia without many of the negative aspects.
  • Garrett Grolemund and Hadley Wickham’s R for Data Science is available online for free, and provides a very clear and systematic introduction that shows how to think and program like a data scientist by getting data into the right form, modelling and visualising it to derive insights, and communicating the results. As the title suggests, Grolemund and Wickham introduce data science using the programming language R; it’s worth learning at least the basics of Python, the main competitor in this space, as well.
  • Rob Hyndman and George Athanasopoulos’ Forecasting: Principles and Practice, available online for free, is an excellent introduction to forecasting (a specific area of statistics and data science that I am currently working in). The book is highly readable and comes with clear explanations of technical problems.
  • Kaggle competitions past and current – in particular, the “kernels,” where people outline the approaches they use to work on challenges in the competitions – show the cutting-edge methods used by real people to solve actual data science problems.
  • McKinsey’s Analytics Insights show different ways that artificial intelligence intersects with the real business needs of top companies around the world. This matters because one of the key responsibilities of a data scientist is to think about the business impact when determining which results could be significant or which results truly matter. (See also the executive’s guide to AI from McKinsey for a helpful high-level overview of supervised, unsupervised, and reinforcement learning, including explanations of what these approaches involve and the key algorithms for each of these methods in machine learning.)