Digital Sky Challenge, the challenges behind the challenge

Samuel Cristobal

2019-11-20 16:43:55
Reading Time: 3 minutes

The Digital Sky Challenge

The Digital Sky Challenge is a 48-hour innovation sprint aimed at creating new digital technologies that will help meet the future needs of the European aviation system in terms of capacity, safety, efficiency, and environmental impact. From 2 to 4 December in Athens, kindly hosted by Athens International Airport, around 60 developers, designers, data analysts and software architects will team up to modernise the aviation industry.

The Digital Sky Challenge is an unprecedented effort to bring together different partners in aviation. Airlines, authorities, airports, service providers each contribute not only expertise but actual operational data. Innaxis has been working to gather data in order to let participants focus on developing the best solutions without dealing too much with data wrangling. This post is about the challenges behind the challenge.

Homogeneous sources

Data comes in different shapes and values, which is why the value of a merged data set is greater than the sum of its parts. In order to be merged in datasets, said data needs to overlap, usually in time and space.

For the Digital Sky Challenge, a one-month period and geographical coverage was initially selected and further refined according to data available. In most cases, desirable data is simply not available and the problem needs to be reformulated to ensure data can provide sufficient ground truth for further analysis.

In addition. elements across datasets need to be identifiable, entailing an additional challenge when the identifiable fields are also confidential. Innaxis has used the DataBeacon platform that solves this issue using cryptography in the form of Secure Data Frames (SDFs) as explained below.

The technology breach

Some partners already have REST-full APIs ready to be consumed: in some cases, data needed to be deep fished from a data lake; others provided a set of Excel sheets. There is nothing wrong with storing data in Excel, actually in most cases it is the best approach for really small data as opposed to data lakes. The combination of several technologies makes it challenging. Luckily enough, Innaxis’ DataBeacon is able to speak with virtually any system and collect data seamlessly.

Regardless of the effort required, this points out the need for a digital revolution in aviation. There is a lack of standardisation and new technology adoption is uneven. In an era of near light-speed advances, the later that stakeholders join the digital revolution, the harder it is to catch up to speed.

Privacy and trust

The biggest challenge was not technological. Aviation stakeholders are not used to sharing data. Regardless of good intentions, there are doubts and uncertainties about whether shared data could be leaked or compromised. In some cases, confidential fields can simply be removed; in others, removing those fields reduces the quality of the dataset. This applies in particular to when confidential fields are needed to merge data with other sources. For example, without the day of a flight, it would not be possible to find the weather conditions for that flight.

To solve this issue, Innaxis used the Secure Data Frame (SDF) approach. This approach substitutes sensible fields with an almost random string of characters in a way that ensures that the same input always produces the same output, though the original value is almost impossible to extract given the output. Then, data is merged using coded strings. In the final release, those strings are randomised and no information is held over the original values.

After the data rehearsing, the trust was increased, creating a more suitable environment for future data sharing and collaboration.

Finding the right size

During the event, analysts will only have 48 hours and their personal computers to crunch numbers. Therefore, the data needed to be packed to a reasonable size. In some cases, this involved selecting the right columns and discarding duplicates or low representative/information records. Usually, size reduction is done using a PCA or similar algorithm. In this case, it was better to keep the original fields without transformation, since they can easily been understood and communicated.

Final words

In addition to the data preparation tasks, three members of Innaxis will mentor the teams in the technical aspects of accessing and extracting knowledge from the data using machine learning. Moreover, Innaxis will also award a special prize to the most amazing AI-based solution.

Whether you are participating in the challenge or simply curious of the solutions, we hope to see you in Greece!

© datascience.aero