Data Privacy and Confidentiality

David Perez

2017-11-01 12:10:37
Reading Time: 2 minutes

Applying data analytic techniques to the field of aviation requires certain access to confidential data sets generated by stakeholders, such as airlines or air navigation service providers. Considering the competitive nature of the data (such as fuel consumption) or data sensitivity (such as safety events), data privacy is paramount. Additionally, on certain occasions, personal data may also be relevant; i.e. evaluating whether specific training impacts operations.

Respecting individual privacy and enterprise data confidentiality is a primary focus for all data analytic programmes. To do so, strong data protection mechanisms are designed to ensure data analytic suppliers, who are potentially collecting and storing data, guarantee products and services that are free of software bugs, which could allow access to the protected data. The complexities of the software supply chain, however, make data storing and management an important liability.

De-identification or pseudonymization techniques are often used as a means to preserve confidentiality, while allowing stakeholders to work with the data. These techniques either hide certain fields; masking them either temporarily or permanently. If no data fusion is needed, this could be done on-site within the data owner’s premises. However, many data analytic applications require two or more sources to be fused with confidential data. For instance, this is necessary to provide a more complete view for a particular event or to perform a benchmarking analysis against another industry player. The solution to this requirement is often to identify a trusted third-party. This independent party would collect the confidential data, ensure strong data protection mechanisms, fuse or combine the confidential data sources to compute the desired analytics and anonymise the data feeds; frequently irreversibly. This is the approach taken, for instance, by the safety data sharing programmes.

While the trusted third party model is a viable option, it has some limitations. First of all, trust is something difficult to reach and very easy to lose. Just one error can lead an untrusted procedure and render it completely useless. Additionally, confidential data need to travel to a central location, which could also be a point of concern for data owners.

It is worth mentioning alternative options, currently in the development phase. For instance, cryptography in modern infrastructures, like blockchain, has enabled technological solutions for maintaining data security while ensuring usefulness. Secure computation, or privacy-preserving computation, is a field that looks into solutions for multiple parties to compute analytics while maintaining the input confidentiality. Secure computation solutions enables a desired balance, without needing to search for trusted third-parties for potential data-driven applications.

The Secure Data Cloud project was the first initiative to look into the applicability of these concepts to aviation scenarios already in 2013. Currently ongoing projects, like SafeClouds.eu, are developing these concepts further for potential applications in aviation safety.

Author: David Perez

© datascience.aero