Machine learning is producing outstanding results although we know it is still far from emulating human intelligence. Applying machine learning techniques, including multi-level artificial neural networks (deep learning) to, for example, speech or image recognition has been continuously resulting in improved results (e.g. digital assistants like Apple´s Siri or Amazon´s Echo). In spite of the significant progress achieved so far, there are still some challenges that need to be resolved in order to be applicable in most industries. On one hand, we face a fragmented ecosystem, meaning that there is a gap between the data scientists and the domain experts working in each particular sector. In order to be able to convert data into knowledge, collaboration among both expertises is required. On the other hand, challenges related to data management and data analysis need to be addressed prior to implementing machine learning techniques in most industries. These challenges, just to name a few, include heterogeneous and distributed data sources, data validation, distributed data architectures, data security, scalability, real-time analysis and decision-support or data visualization.
However, we cannot fall into the error of assuming that a machine learning problem can be addressed through a generic standard application of a set of algorithms and techniques. Machine learning problems are highly case-dependent and, therefore, the purpose of the analysis needs to be carefully defined in advance. This is what we (at Innaxis) call Purposeful Knowledge Discovery which also was the title of the keynote speech made by Innaxis President Carlos Alvarez Pereira at the SESAR Innovation Days 2017 in Belgrade. And this is, precisely, the approach we follow at Innaxis in our data science research projects, like SafeClouds.eu: an H2020 project aimed at enhancing aviation safety through the application of data science techniques.
SafeClouds.eu includes a team of 16 partners including data scientists and engineers from several research entities (Innaxis, Tadorea, Fraunhofer, TU Munich, Linköping University, TU Delft and CRIDA) and a group of airlines, ANSPs and safety authorities (Iberia, Air Europa, Vueling, Norwegian, Pegasus, LFV, Eurocontrol, AESA and EASA). This group of airspace stakeholders is the user group of the project, in other words, those defining the questions for which they need data for gaining answers. These questions can be of three types: descriptive (what happened?), predictive (what will happen?) or prescriptive (what to do for what we want to happen). Once the questions are defined (SafeClouds.eu use cases) the team of data scientists and engineers work together and collaborate with users covering the full cycle of data science techniques: data management, data processing architecture, deep analytics, data protection, pseudo- anonymization, advanced visualization and user experience. As previously mentioned, every step has its own challenges as there are no data science standard tools to be transferred automatically from one field to another. Below, we outline just two challenges: fusion of proprietary confidential data and benchmarking among these competing stakeholders.
- Smart Data Fusion: Simply erasing the flight-identifier parameters would protect the data but not allow fusion of datasets. Many data require protection and cannot be shared (e.g. FDM data and radar tracks), so fusion needs sophisticated techniques coming from cryptography and enabling coding sensitive data in a non-reversible way.
- Secure Blind Benchmarking: Benchmarking among stakeholders based on data that cannot be shared also requires the application of specific techniques. This includes secure multiparty computation enabling comparison between confidential data without disclosing the data, not even to a trusted third party.
These are just some examples of the challenges the SafeClouds.eu team is facing in the field of aviation safety data analysis. The solutions offered by these techniques make them ideal to be applied to other fields such as fuel consumption but, again, the purpose of the analysis will determine the following necessary steps.