Machine Learning is fascinating, even “magical” in some ways—it is said that it can solve any problem in the world. Personally, when I face any kind of real-world problem, I think of whether or not it can be solved with machine learning. I am one of those optimistic dudes who claim that almost every non-creative operation can be automated using machine learning. But is it really true that, under every circumstance, machine learning will always outperform rule sets and heuristic approaches?
In order to better assess this discussion, let’s formulate a real problem in aviation in a very (really very) simplified way.
Due to new Covid-19 restrictions, we need to assess the number of people present at the gate of an airport. This means that we are given a features table (X) with some typical variables used in A-CDM information systems such as callsign, destination airport, aircraft registration, aircraft type, day of the week, time or arrival gate. We want to predict the number of passengers (y) arriving at certain gate of an airport. In order to achieve this, we need several years of historical data. We could encounter different behaviours in our system:
[AA0001, LEBL, Monday, 9:00, C98] → 108 passengers
[AA0001, LEBL, Wednesday, 16:00, C95] → 80 passengers
[AA0001, LEBL, Monday, 9:00, C98] → 110 passengers
…
[AA0001, LEBL, Wednesday, 16:00, C95] → 77 passengers
Imagine wanting to predict the number of passengers arriving at gate C98 on the scheduled flight coming from LEBL next Monday at 9:00. How many passengers are we expecting? I hope you answered around 100 passengers, and indeed this was not a trick question. Now, how would you build a system to estimate the passengers? Would you try to use machine learning? In other words, would you try to find patterns in these data and try to turn them into a recipe (predictive model) for going from inputs to outputs?
No, of course you wouldn’t! You’d get your software to do exactly what you’re doing: look for an estimate in a table based on past events, perhaps even using some simple statistics such as the average of passengers. In this case, there is no need to build a complicated ML model, just use a rule-based algorithm.
Now lets look at this other case:
[AA0001, LEBL, Monday, 9:00, C98] → 108 passengers
[AA0001, LEBL, Wednesday, 16:00, C95] → 20 passengers
[AA0001, LEBL, Monday, 9:30, C98] → 140 passengers
…
[AA0001, LEBL, Wednesday, 18:15, C100] → 60 passengers
No pattern is obvious, at least from the sample we can get. What can we do? Should we use the “magic” of machine learning? That depends…we have no certainty that a pattern exists that connects the inputs with the outputs. If this is really the case, that there is no connection, Machine Learning wont help. So, in what cases can Machine Learning help us?
In order to successfully apply Machine learning, we must be sure that we will find a useful pattern. We need to look at the data, perform some descriptive analysis and ask domain experts. Perhaps passenger arrivals are not as random as they look…but more importantly, perhaps the pattern describing passenger arrival won’t generalise. For example, are we expecting 100 passengers on a Monday in a post Covid-19 era?
Let’s be clear, if your data is not useful in the future, it doesn’t matter how good or large the historical data is. In fact, every time you think about machine learning, ask first “have the rules changed?”. However, not everything has been ruined by Covid-19. Machine learning is still a very useful tool and maybe if you take a deep look into your data, you’ll find yourself in an applicable scenario and in luck. In this type of scenario, if there is a pattern, and you can prove that the pattern is still relevant, you can still use a machine learning approach to your advantage.