In the last several years, an explosion of workshops, conferences, and symposia, in addition to books, reports and blogs have covered the use of data in different fields, including aviation. The use of aviation data to derive new processes to improve, for instance, airport or airline operations, is becoming a trendy topic. Some of those activities, trying to gain attention in an already inundated field, use a variation of the words “data”, “data driven”, “big data”. Some make reference to the techniques: “data analytics”, “machine learning” or “artificial intelligence”. Other related terms are “deep learning”, “operation research” and even the “internet of things”.
In the view of this large pool of efforts, it could be helpful to clarify the difference between some of the terms used: statistical analysis, machine learning, artificial intelligence and data science.
First of all, the term “data driven” is probably one of the most used and, paradoxically, probably the term to be avoided most. Most classical techniques use data as the core of their design, such as Montecarlo simulations available since the 1930s. It can be argued that a technique or operational concept is “data driven” when data is the core of the design. However, “data driven” as a concept is considered very generic these days and does not clarify how data is actually used. We recommend avoiding this expression as it is mostly meaningless.
Within the data analytics field, we would like to mention statistics, artificial intelligence and machine learning.
Statistics is even older, with initial developments dating back to the 1600s. As an academic discipline, statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. The field adds significant value to any research effort and one may argue that statistics could be the first and fundamental step of any other related technique to be developed. Still, many reports today are based on statistics and provide considerable information to the aviation industry professionals. Statistics may exhibit some degree of forecasting, but only when the relationships between the variables are simple enough to be presented and directly readable by humans.
Artificial intelligence is also a broad field, but more modern. Although some ideas were presented in ancient times, the field has been documented as being born at a workshop at Dartmouth College in 1956. Artificial intelligence is presented in contact with the intelligence displayed by humans and other animals and the domain may include not just techniques that use broad amounts of data, but also others like agent-based modelling which have been used in aviation.
Different techniques are part of this broad field of artificial intelligence, but in general, artificial intelligence refers to the provision of techniques for solving tasks that are easy for humans but hard for computers. Autonomous driving comes to mind as the obvious example in current times. Lately, the field of robotics has offered very good examples of tasks that are doable by humans and traditionally difficult for machines. Aviation must quickly learn from those.
Within the field of artificial intelligence, machine learning is a set of techniques and algorithms that use large quantities of data to achieve certain AI goals. Machines acquire the ability to learn different tasks, with some of them being very sophisticated and yet trivial to humans, like vision or character recognition. These machines are not explicitly programmed for these tasks but rather perform them through the use of general algorithms designed to be trained with data. Depending on whether the training data was previously labeled by humans or not, the learning process will be called either supervised or unsupervised learning. Either way, those algorithms, once trained, are capable of making predictions or decisions, weighing the features that provide the best result. Though machine learning started in the 1980s, only recently have computers exhibited the capabilities to store and process the enormous quantities of data required to train these algorithms. Artificial neural networks come to mind and there are few examples of research projects available in aviation, like SafeClouds. Within machine learning, deep learning is a subset of algorithms that use multiple layers for feature extraction, mainly based on artificial neural networks.
However, data analytics involves more than one challenge. In order for the most successful algorithms (e.g. artificial neural networks) to thrive, we must solve different problems. First, we require specific techniques to acquire, clean, fuse and semantically describe the different datasets available. Safeclouds.eu is using more than 10 different data sources to describe some operational challenges. Fusing all those datasets requires specific data management techniques. All these datasets require being run in a computer (processing) architecture that is scalable and that puts the data scientist at the center. Aviation data analytics require a particular infrastructure which we have written about before. Data are often proprietary and/or confidential and protecting the datawhile enabling analytics requires specific techniques. Cryptography has helped in this occasion. A variety of techniques, like secure multiparty computation or fully homomorphic encryption show promise in solving some of the confidentiality barriers to a broader use of data, although they still require some research. Last, but not least, data need to be consumed by humans, and visualization, as well as the user experience, needs to be taken into account in a full stack data solution.
All these fields, together with deep analytics, constitute the field of Data Science. We feel this term encompasses all the necessary techniques, tools and methodologies required to advance knowledge on the use of data in aviation.