In the aviation world, there are plenty of data-driven use cases arising not only within aviation-related organizations, such as airlines, manufacturers or ANSPs, but also among R&D projects. Almost all of these use cases are limited by the same constraint: the availability of the data. In order to guarantee a minimal reliability in the results, a decent amount of data is needed. In most cases, data sources are private, which limits a lot the potential of the innovative solutions. Due to this, it crucial to identify publicly available data sources first and, second, to understand the potential case studies that can exploit the usage of these data sets.
In this post, we will delve into one of the most popular data sources in aviation industry: ADS-B. Automatic Dependent Surveillance Broadcast (ADS–B) is a surveillance radar technology in which an aircraft determines its position via satellite and periodically broadcasts it, enabling tracking from the surface. Ground stations have installed multiple antennae to continuously collect this data, contributing to huge data silos with global airspace routes. Fortunately, there are lots of ADS-B historics, and it’s quite easy to access this kind of information. One disadvantage of using ADS-B is the lack of ground-based transceivers, with some shadow regions without antennae unable to provide data. Sometimes this can lead to slight incongruities along the flight course. Assume that ADS-B data is composed of these columns among others: altitude, callsign, heading, latitude, longitude, time, velocity, vertical rate and on_ground.
We are going to present 5 simple data-driven problems that, in our humble opinion, are very trendy nowadays in aviation fields, and we are going to apply machine learning techniques to solve them. Particularly, we will approach all of them assuming that ADS-B is the only data source available. In addition, we will propose other data sources that exist in aviation (Ref. Aviation Data Compendium) that might support the information provided by ADS-B.
1 – Predicting Go-Arounds maneuvers
The predictability of go-arounds is quite an interesting problem to solve; its solution can help mitigate the effects produced after it occurs. From a machine learning point-of-view, the main inconvenience of this use case is the data imbalance problem. Since this event happens in about 0.3% of operations, if we try a binary classification problem predicting 0/1 if a go-around occurs, the model will need equality between the classes to learn more of go-arounds. Another challenge will be the labelling of the data, since we won’t know at glance if a certain flight experienced a go-around just by looking at raw ADS-B columns.