From batch processing to streaming processing in Aviation

Dario Martinez

2021-09-14 13:18:45
Reading Time: 3 minutes

Batch vs Streaming processing

In traditional data pipelines, analysing today’s events would mean waiting until tomorrow night for more jobs to finish. This is often referred to as batch processing. In batch processing, we wait for a certain amount of raw data to “pile up” before running an ETL job. Typically, this means data collects for anywhere from an hour to a few days before analysis. Batch ETL jobs will typically be run on a set schedule (e.g. every 24 hours), or in some cases, once the amount of data reaches a certain threshold.

In many industries (including the aviation industry), data becomes available in a matter of seconds (e.g. aircraft positions), minutes (e.g. meteorological information) or hours (e.g. flight plans and regulations data). In such scenarios, real-time stream processing is a paradigm worth leveraging. Data is processed as soon as it arrives at the data platform, or in sub-second timeframes. For the end-user, this processing is near enough to resemble real-time. These operations would only store a “small” state, and as such would involve relatively simple transformations or calculations. Real-time stream processing technologies are undoubtedly powerful. However, they introduce a vast amount of complexity into infrastructure and data pipelines. Tooling, data format and overall complexity of the requisites of real-time processing systems are a challenge for any data engineer. 

The real-time streaming stack and methodology

What are real-time streaming processing use cases outside the aviation industry?

Following are some of the use cases.

What about the aviation industry?

In the aviation industry, there are many data sources that enable real-time streaming case studies:

  • Automatic Dependent Surveillance-Broadcast (ADS-B) data is usually served as a stream of aircraft position data. Many companies such as FlightAware provide APIs that enable immediate real-time streaming analytics. Analysing these streaming at real-time speed is a paradigm that not many companies have yet looked into. Some common aviation case studies could be extrapolated to real-time, such as trajectories prediction and sector complexity forecasting.
  • Flight Data Monitoring (FDM) has gradually increased in use across aviation. It’s a natural progression from the black boxes of the past, which provided a way for aircraft operators to investigate accidents. As an investigative tool, though, black boxes were often complex and technical. Despite advances in data science and the effort of some companies to provide analytics over FDM, the wealth of valuable insights that this data source can provide remains to be seen. With the adoption the real-time streaming processing, airlines and manufacturers could use FDM data analysis to its full potential, enabling case studies involving real-time recommendations and predictive alert systems, such as detecting unstable approaches and flight anomalies in real-time.
  • Airport data-driven cases are quite challenging given the number and variety of different data sources and different data providers (airline ticketing, baggage transportation, passenger movements, retail, weather, e.g.). Furthermore, some of these providers provide data in legacy interfaces that raise the complexity of the case study even higher. However, with the correct stack of technologies and sufficient expertise in data-driven real-time case studies, any challenge is doable.

Do you have any challenging case studies that may involve streaming processing? Are you looking to level up your data analysis capabilities by implementing real-time data pipelines? If you are interested in collaborating with us at DataBeacon, contact us!

Author: Dario Martinez

© datascience.aero