Data Science and Stationarity in aviation

Seddik Belkoura

2018-06-28 09:19:10
Reading Time: 3 minutes

One of the main concepts behind Machine Learning is stationarity, i.e. that the characteristics of a system do not vary through time. In other words, to properly translate any operational problem in a Data Mining problem, one must ensure that the data used (during the training part) is representative of the situation and of the future. In most real word applications, stationarity cannot be ensured, yet is assumed for a certain time window depending on the system’s dynamics. For example, though the phenotype of humans has changed over many years of existence, it can be safely assumed that in the following years, no radical changes will happen.

When applying Data Science to any real world application, stationarity is unconsciously taken into account. In other words, when a model is trained to classify or to predict, it assumes as a point of fact that the system under study does not change its internal dynamics (or at least dynamics that might affect the performance of the model). When such an assumption is invalid, the performance of the model is monitored. When dysfunctions are identified, the model is identified as needing recalibration. Predictive models in finance are always re-calibrated after a short span of time as the financial market is not stationary.

Such approaches can be sustainable for some environments such as finance markets. However, aviation and, in particular, safety in aviation cannot risk drops in performance of models on which human lives depend. Can an airport be considered as a stationary system? Can we predict the future ROT of a plane for the next months or years? While no new procedures are introduced, one can safely assume that airport Traffic Control rules do not change that significantly to perturb the validity of a Data Mining exercise. This means that, under the assumption that no changes on how landing/departures are dealt with, the validity of the model can be ensured.

Yet, the previous sentence includes also non-trivial situations. Internal dynamic changes are naturally linked with the addition of a new procedure or the modification of one -the eventual effects of which on the model can be assessed beforehand. However, changes might be due to current procedures hidden from data or not recorded. The previously introduced ROT prediction illustration helps to clarify this concept. The introduction of a tool predicting ROT should stay stable as long as the system is considered stationary. Yet, there are operational procedures, hidden from the data scientists, that might cause a significant drop in the model performances as it changes the way flights are dealt with. Indeed, ATCO – pilot interactions are not duly documented. Hasty data scientist might deliver a model without taking into account these interactions because they are not explicitly documented. Such eventuality should be avoided as an ATCO trusting the algorithm will stop interacting with the pilot and indirectly change internal procedure of the system – thus invalidating the model.

Confronted with such situations within the scope of SafeClouds.eu, the team extracted two important conclusions:

  1. The Data Scientist – Operational Users interaction is primordial. Operational users are the ones aware of potential procedures that are not documented. The Data Scientist must ask relevant questions in order to properly understand the problem and to guide the operational users towards the discovery of potential bottlenecks or hidden dimensions (such as non-documented ATCO-pilot interactions). Once identified, the data scientist disposes of a large range of techniques (e.g. non-supervised learning will be used to extract verbal interactions from their effect on velocity patterns) to assess the effect of such hidden dimensions on the model. Thanks to the technical-operational interactive work, the applicability of AI solutions in safety can be assessed with good risk management.
  2. It is very important for aviation to document everything as its importance cannot be assessed easily and might prove useful for a later application. We understand the difficult nature of some data as might be represented in verbal exchanges (i.e. different languages etc.), yet the introduction of Deep Learning and the emergence of powerful tools as Google’s TensorFlow API raises confidence on our future ability to recognize speech interactions more and more precisely. This is a field of research that should not be abandoned.
Author: Seddik Belkoura

© datascience.aero