One of the main concepts behind Machine Learning is stationarity, i.e. that the characteristics of a system do not vary through time. In other words, to properly translate any operational problem in a Data Mining problem, one must ensure that the data used (during the training part) is representative of the situation and of the future. In most real word applications, stationarity cannot be ensured, yet is assumed for a certain time window depending on the system’s dynamics. For example, though the phenotype of humans has changed over many years of existence, it can be safely assumed that in the following years, no radical changes will happen.
When applying Data Science to any real world application, stationarity is unconsciously taken into account. In other words, when a model is trained to classify or to predict, it assumes as a point of fact that the system under study does not change its internal dynamics (or at least dynamics that might affect the performance of the model). When such an assumption is invalid, the performance of the model is monitored. When dysfunctions are identified, the model is identified as needing recalibration. Predictive models in finance are always re-calibrated after a short span of time as the financial market is not stationary.
Such approaches can be sustainable for some environments such as finance markets. However, aviation and, in particular, safety in aviation cannot risk drops in performance of models on which human lives depend. Can an airport be considered as a stationary system? Can we predict the future ROT of a plane for the next months or years? While no new procedures are introduced, one can safely assume that airport Traffic Control rules do not change that significantly to perturb the validity of a Data Mining exercise. This means that, under the assumption that no changes on how landing/departures are dealt with, the validity of the model can be ensured.
Yet, the previous sentence includes also non-trivial situations. Internal dynamic changes are naturally linked with the addition of a new procedure or the modification of one -the eventual effects of which on the model can be assessed beforehand. However, changes might be due to current procedures hidden from data or not recorded. The previously introduced ROT prediction illustration helps to clarify this concept. The introduction of a tool predicting ROT should stay stable as long as the system is considered stationary. Yet, there are operational procedures, hidden from the data scientists, that might cause a significant drop in the model performances as it changes the way flights are dealt with. Indeed, ATCO – pilot interactions are not duly documented. Hasty data scientist might deliver a model without taking into account these interactions because they are not explicitly documented. Such eventuality should be avoided as an ATCO trusting the algorithm will stop interacting with the pilot and indirectly change internal procedure of the system – thus invalidating the model.
Confronted with such situations within the scope of SafeClouds.eu, the team extracted two important conclusions: