The XKCD comics network. Check it out here!

I love XKCD. According to their website, the webcomic is about romance, sarcasm, math, and language, but after so many years, Randall Munroe explored many other topics as well. Some of them more than once.

I wanted to know the structure of this fantastic stick-figure world he created, so in my spare time I scraped all his webcomics from the interblag (or blagosphere), then extracted all the relevant words from each, and finally plotted the result below. In the following graph, each node represent a comic, and 2 comics share an edge if they contain words in common. …

Many Data Scientists use an automated tool for A/B tests, like Google’s Firebase or Optimize. These tools let you choose which metrics to pay attention to, and they automatically tell you when your test reached significant levels.

However, sometimes you need to do everything manually. That’s why Data Scientists get hired, right? Maybe the tool isn’t flexible enough for your needs, or it’s buggy and causes a disruption in user experience.

Anyway, here are detailed instructions on how to add the significance level of your test directly in BigQuery. In 3 simple steps.

Step 1: write a scheduled query in BigQuery

First, you need to create a scheduled…

A bot that knows SQL

10 days after joining fromAtoB, in September 2019, when I was still living in a hotel, we decided to restructure our Google DataStudio dashboards. The goal was to help people understand what questions could already be answered with a chart. Simple, repeated questions about the data should not be manually answered every time they are asked. In an ideal world, people should be able to answer them themselves by quickly accessing a dashboard.

To make our dashboards more user friendly, I had the idea of developing a Slack bot. A friendly Slack bot that would redirect users to our dashboards…

The other day I needed to revise Gephi layouts when preparing for a teaching/consultancy service for a friend who does Social Network Analysis. Here is a summary of what I re-learned.

Gephi is an amazing open-source network analysis and (interactive!) visualization software with tons of really useful tools for exploring graph data, calculating statistics, detecting clusters, communities, etc. It requires no coding skills. Even if you have coding skills, please stop using Python’s NetworkX for a bit and try Gephi. It’s worth it.

One of the very nice features Gephi offers is a bunch of different layout algorithms — that…

Today I woke up and I realized that even though I love automating things, I kept repeating the following behavior:

  1. I decide to start a new experiment.
  2. I run jupyter-notebook on the terminal.
  3. I create a new notebook in the dashboard.
  4. I rename the notebook.
  5. I start hacking.

I had always wanted to just type jupyter-notebook experiment.ipynb and for a named notebook to appear. That would save me about 10 precious seconds that I could maybe use to obsess even more about Coronavirus numbers!

So here’s a quick hack to never, ever having to rename notebooks in your life. It…

Kolmogorov-Smirnov helps to compare the distributions of the training and test set. Use Mahalanobis first for word embeddings.

In a previous post, I described a neat trick to try when you are getting a high training score but a low test score. The idea was that maybe your test set has a different distribution from the training set, and you can actually know whether that’s the case with a bit of help of (more) Machine Learning.

At the end of the post, I mentioned that the problem can also be tackled with a statistical test: mainly, the Kolmogorov-Smirnov statistic. How does it work?

The Kolmogorov-Smirnov statistic tests the (so-called “null”) hypothesis that 2 independent samples (of numbers) are…

If you are a Data Scientist, this probably happened to you: you got excellent results for your model during the learning process, but when using the test set, or after deploying to production, you get a much lower score: everything just goes wrong.

Source: makeameme

Am I overfitting? Do I have a bug in the code? Am I suffering from data leakage?

Sometimes, the distribution of data in the test set is very different from the one of the training/validation set. Maybe you are working with time series and the test data belongs to our post April 2020 world 🦠. …

…and they will probably get weirder with time.

These last couple of years were kind of a boom for image recognition tools. Especially the ones that use GANs (Generative Adversarial Networks, an incredible idea) to achieve their goals.

Here are the new toys we can play with (can you imagine the level of madness of the tools that weren’t released to the public?):

  • FaceApp

Who hasn’t played with this Russian app? It lets you change your age, hairstyle — even modify your gender, something that caught quite a lot of attention of the transgender community. It’s available both on Android and iOS. The non-premium version is good enough…

Ver cómo la mayoría pierde su dinero en una simulación vale más que mil palabras

La estafa del telar de la abundancia (o “Flor de la abundancia”, o “mandala de la abundancia”) ya es muy conocida, y ahora resurgió empleando un discurso feminista.

Una de las imágenes que se divulgan.

La idea es muy simple: veamos la flor de arriba. Cada pétalo representa a una persona. Imaginemos que le faltan todos los Fuego — los más alejados del centro. Son éstos los que harán la inversión inicial — que ronda los 500 dólares; es responsabilidad de los 4 Aire reclutar a 2 Fuego cada uno. …

Riding a bike in BA. Isn’t the Women’s Bridge beautiful?

Who doesn’t like a good bicycle ride in the park? Buenos Aires people (“porteños”, a word which refers to our port), apparently love it.

In this article we will resume some analyses done with the goal of identifying and understanding social patterns hidden in Buenos Aires Ecobicis open data.

This dataset provides both geolocalization of bicycles stations and data points regarding:

  • Who (anonymized, of course) took a bicycle.
  • Where did they took it from, and where they left it.
  • At exactly what time.
  • Full duration of the trip.

My good friend Fede Catalano talked quite a bit about the characteristics…

Billy Mosse

Mathematician. Data Scientist at fromAtoB. Machine Learning freelancer. Expat. I like bots.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store