Be aware of “false” precursors in Machine Learning

Seddik Belkoura

2018-05-09 10:00:47
Reading Time: 3 minutes

Business and data owners in any complex environment, such as Air Transportation, are keen on the extraction of explanations for events of interest. The term extraction of explanation speaks to how explanations are uncovered within data itself. With the recent explosion computational power and growing amounts of data, the proliferation of Data Mining(1) techniques yield to the natural applications in AT – a clear example can be found in the project where data from various stakeholders (Airlines, ANSPs, etc.) will be used to explain the generation of safety events as unstable approaches.

The general consensus for explanation extraction buttresses on causation definition (Wiener definition to be more exact!), where a specific event that happens prior to the event of interest and helps to predict it must somehow explain its appearance. This has been the base of all predictive/descriptive Machine Learning exercises. First, features are engineered in order to find the best “prior” inputs that helped predict the studied event. After, validation techniques are used to make the research sound and proof. Yet, where no error may be technically found (e.g. data as been successfully separated in training and testing partitions, validation techniques has been applied, no over-fitting is happening, etc.) a fundamental additional step must be included that transforms found predictors into real and useful explanations of an event.

The picture to the left illustrates the paradigm. Event C cause event A and B with two different lags. If C is somehow unavailable/invisible (which is highly probable to happen in complex and iterating systems such as ATM, where data are separated into various private silos) and only A and B are studied, a spurious correlation link will be found between events A and B as they share common information shifted in time. Imagine that a classification algorithm was used to extract features driving unstable events.

These features are then called “predictors” as their appearance might give you probabilistic information about the future. Even still, nothing ensures that these feature provide a satisfactory enough explanation.

This paradigm gains even more importance in safety operational environments. Spurious correlation can be detected in manageable systems such as biology where experiments are done to inhibit event B and see whether event A still happens, therefore assessing the existence of a causation between both events. When human safety is at stake such “experiments” are not possible. The question involved becomes how to ensure that the predictors found by the Data Mining algorithm are not the result of spurious correlation.

Expertise or human logic and additional machine learning are what can confirm whether an explanation is sound or not. A post-analysis of the results helps to understand whether the explanation behind the link found by the algorithm is insightful or misleading. Sometimes, that explanation is logical. Sometimes, the road to finding the logical explanation is not straightforward, as some middle-steps are necessary. Sometimes, it is simply spurious links that can be discovered only by finding the hidden dimension driving both events.

The message of this blog is two-folded and profoundly optimistic. Precursors are not derived simply from Data Mining applications. And additional efforts are necessary – sometimes efforts as simple as an expert validation. Another such effort might be through specific study to focus on a specific interaction with the aim of validating its explanatory nature to dissociate true explanations from spurious correlations. Yet the intake of Machine Learning remains primordial – it allows the narrowing down and partitioning of high complexity of the initial problem to smaller, “simpler connections” that might be solved by the human mind. The combination of these two steps would allow Machine Learning proliferation in sensitive, real world applications such as safety matters.

(1) Here, Data Mining and Machine Learning can be interchangeably used.

Author: Seddik Belkoura