Instead, plot its components, and check that they make sense

Photo by Angela Compagnone on Unsplash

Many data scientists approach modelling as a feature engineering task: they gather features, do cross-validation, and if the score isn’t good enough, they keep adding more features until it is. In forecasting, where there is typically little data, this is risky. It can be much more effective to look inside your models and understand what they are doing — makes this especially easy thanks to its function. Let’s look at some examples of when using it can help diagnose poor forecasts.

Imaginary seasonality appears like a ghost!

Let’s generate some data completely at random…

Make your return types more precise!

Photo by Brett Jordan on Unsplash

What happens if you don’t use

Suppose we have a function which take a boolean argument and that its return type depends on the value of . If it’s , it returns , else it returns an integer:

By inspecting the function, we can see that we expect the return type in line 17 to be , while the return types in lines 18 and 19 to be . However, doesn’t peak inside the function, and so believes them both to be . Indeed, running the above snipped, we get: note: Revealed type is 'Union[main.Cat, None]'…

Learn how, in some cases, you can calculate posterior distributions by hand!

Photo by Scott Graham on Unsplash

Suppose you’ve tossed a coin 1,000 times and obtained 292. You’d like to know what the probability of obtaining heads is from a single coin toss — but you don’t just want a single estimate, you want an entire distribution. If you define

  • y: the number of heads you obtain
  • θ: the probability of obtaining heads from a single coin toss

and then model y as a binomial distribution with n=1,000, then the posterior distribution is very easy to obtain with just a few lines of code:

Quickly find out how many pull requests you’ve submitted and/or reviewed!

Photo by Markus Winkler on Unsplash

Say you want to show off how many pull requests you’ve submitted this year, or how many you’ve reviewed. Manually counting them on GitHub would be hard work…but there’s an easier, simpler, and more efficient way.


In a Python3 virtual environment, install with :

Generate an access token

On your GitHub account, go to “settings”

Maintaining notebooks is hard work
Photo by Julia Joppien on Unsplash

We data scientists love Jupyter Notebooks: they enable fast prototyping, let us tell stories with our code, and allow us to explore datasets thoroughly. Yet, as anyone who’s tried to keep a suite of Jupyter Notebooks under version control will tell you, they’re really hard to maintain.

Data scientists love Jupyter Notebooks…

There are many reasons for this, e.g.:

  • enable fast prototyping
  • let you tell stories with your code
  • allow you to thoroughly explore your datasets

…but Python code quality tools don’t!

If we want to use any of the following excellent tools:

  • (style guide enforcement)
  • (suggests refactorings, brings up code smells)
  • (check static type annotations)
  • and many, many…

Make them interactive and with hideable code cells


You’ve just written an amazing Jupyter Notebook, and you’d like to send it to your coworkers. Asking them to install Jupyter isn’t an option, and neither is asking IT for a server on which to host your page. What do you do?

I’ll show you how to export your notebook as a self-contained html report which anyone can open in their browser. I’ll start with the simplest possible example of how to export an html report, then I’ll show how to hide the input cells, and finally I’ll show how to toggle showing/hiding code cells.

Source material can be found…

Marco Gorelli

Data Scientist, pandas maintainer, Kaggle competitions expert, Univ. of Oxford MSc

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store