When NOT to apply Machine Learning: a practical Aviation example

Machine Learning is fascinating, even “magical” in some ways—it is said that it can solve any problem in the world. Personally, when I face any kind of real-world problem, I think of whether or not it can be solved with machine learning. I am one of those optimistic dudes who claim that almost every non-creative operation can be automated using machine learning. But is it really true that, under every circumstance, machine learning will always outperform rule sets and heuristic approaches?  

In order to better assess this discussion, let’s formulate a real problem in aviation in a very (really very) simplified way.

Predicting passenger arrivals

Due to new Covid-19 restrictions, we need to assess the number of people present at the gate of an airport. This means that we are given a features table (X) with some typical variables used in A-CDM information systems such as callsign, destination airport, aircraft registration, aircraft type, day of the week, time or arrival gate. We want to predict the number of passengers (y) arriving at certain gate of an airport. In order to achieve this, we need several years of historical data. We could encounter different behaviours in our system:

Case A: Scheduled flight plans

[AA0001, LEBL, Monday, 9:00, C98] → 108 passengers

[AA0001, LEBL, Wednesday, 16:00, C95] → 80 passengers

[AA0001, LEBL, Monday, 9:00, C98] → 110 passengers

[AA0001, LEBL, Wednesday, 16:00, C95] → 77 passengers

Imagine wanting to predict the number of passengers arriving at gate C98 on the scheduled flight coming from LEBL next Monday at 9:00. How many passengers are we expecting? I hope you answered around 100 passengers, and indeed this was not a trick question. Now, how would you build a system to estimate the passengers? Would you try to use machine learning? In other words, would you try to find patterns in these data and try to turn them into a recipe (predictive model) for going from inputs to outputs?

No, of course you wouldn’t! You’d get your software to do exactly what you’re doing: look for an estimate in a table based on past events, perhaps even using some simple statistics such as the average of passengers. In this case, there is no need to build a complicated ML model, just use a rule-based algorithm.

Case B: Disruptions and uncertainty

Now lets look at this other case:

[AA0001, LEBL, Monday, 9:00, C98] → 108 passengers

[AA0001, LEBL, Wednesday, 16:00, C95] → 20 passengers

[AA0001, LEBL, Monday, 9:30, C98] → 140 passengers

[AA0001, LEBL, Wednesday, 18:15, C100] → 60 passengers

No pattern is obvious, at least from the sample we can get. What can we do? Should we use the “magic” of machine learning? That depends…we have no certainty that a pattern exists that connects the inputs with the outputs. If this is really the case, that there is no connection, Machine Learning wont help. So, in what cases can Machine Learning help us?

Pattern recognition in a non-stationary universe (or how Covid-19 is also ruining ML models)

In order to successfully apply Machine learning, we must be sure that we will find a useful pattern. We need to look at the data, perform some descriptive analysis and ask domain experts. Perhaps passenger arrivals are not as random as they look…but more importantly, perhaps the pattern describing passenger arrival won’t generalise. For example, are we expecting 100 passengers on a Monday in a post Covid-19 era?

Let’s be clear, if your data is not useful in the future, it doesn’t matter how good or large the historical data is. In fact, every time you think about machine learning, ask first “have the rules changed?”. However, not everything has been ruined by Covid-19. Machine learning is still a very useful tool and maybe if you take a deep look into your data, you’ll find yourself in an applicable scenario and in luck. In this type of scenario, if there is a pattern, and you can prove that the pattern is still relevant, you can still use a machine learning approach to your advantage.

So, what should we avoid using ML for?

  • Don’t make an algorithm to regurgitate “memorized examples”. Use look-up tables for that! The beauty of ML is generalising to new scenarios.
  • Don’t force predictive models in non-stationary scenarios. They won’t work in production (and will make Machine Learning look bad).
  • Don’t use ML if there is no useful pattern in your data (and don’t invent one).

About Author

Darío Martínez

Darío is a Data Scientist, who is passionate about programming, statistics and business intelligence. He is quite the scientist; any piece of code can be manipulated to predict the future. The larger and messier the dataset, the better. Read more about Darío Martínez

Related Posts