Big Data Spain 2018: An example business case applicable to aviation

The data science team at Innaxis attended the Big Data Spain 2018 conference to learn about the latest development in data processing and improve our data processing skills. It is the biggest Big Data event hosted in Spain and with almost 100 speakers participating in 80 different talks. Although the conference is mainly ICT, many talks could apply to aviation as well, both to identify potential business cases that can be also applied in the aviation ecosystem, but also to improve the data processing stack to process more data and get results faster.

The most interesting business talk was one presented by Cabify. There are two key performance areas of interest in their business model: One is to maximize the revenue of their drivers and the other is to have an optimal match between drivers and the riders (clients that want to be have a ride) that minimizes waiting time for both sides.

  • For the former, their first approach was to have an static fare schema. Riders were more satisfied with this approach, as they can plan ahead with a predictable budget, but there was no incentives for drivers to work on congested days. Now they are using a dynamic fare schema that raises the fares when demand exceeds supply. The demand/supply ratio is updated using a streaming processing pipeline that collects current service times and ride requests and then process it in 1-minute windows. This is a simple solution that does not require any machine learning model, but just monitoring the system the drivers’ earnings have increased up to 20%.
  • For the latter, their first approach was to select the nearest driver in bird distance, filter those that are further than a certain threshold and then match them using a FCFS basis. This approach is heavily affected with congestion, so many riders cancel their rides as soon as the waiting time was larger than expected, also leading to a lower satisfaction ratio. Now, by using ETA predictions instead to do the matching, drivers are selected by lowest ETA, except those that have a waiting time exceeding a certain threshold and then match the rider with the driver that has the lowest revenue at that moment. This model update has yielded noticeable savings in both working hours and kilometers driven (up to 0.5M Km less per week).

What makes the approach interesting is that they made their ETA prediction model from scratch instead of using commercial API ETA solutions. They map their service area in hexagonal cells and each ride is tracked and stored in the data platform. The historical data is used as input to train a deep learning model that generates journeys from one place to another to make predictions. Using hexagonal cells provides some handy semantic features that raise the performance of the prediction in several times, using rules like collocations in natural language processing. Also, it infers knowledge by creating clusters of nearby areas and identifying main roads, most taking directions and catchment areas in minutes.

This approach could be used in aviation in several ways, such as to manage airport passenger flows, but also to apply optimal 4D trajectories in airspace. This is only one single example of ideas that were shared at the conference.

Interested in the more technical details? We will cover the technical talks in a future blog post. Stay tuned!

About Author

Jorge Martín

Jorge is our go-to person for any data, computer science, programming or ICT related questions. Jorge enjoys performing research and software development and dabbles in project management. Read more about Jorge Martín

Related Posts