Can distributed apps store sensitive data being GDPR compliant?

Antonio Fernandez

2019-06-13 13:17:37
Reading Time: 3 minutes

Since the release of General Data Protection Regulation (GDPR) in May 2018, ensuring privacy and data protection has become a headache for most of the companies in the European Union who have had to adapt their platforms accordingly. However, not all technologies are prepared to accommodate the restrictions imposed by GDPR. Therefore, it is worth reviewing how different technology strategies align with the regulation.

As one example, are distributed ledger technologies (DLT), most commonly know as blockchain, compatible with GDPR? Can DLT complement GDPR in some way? How could blockchain comply with the ‘right to be forgotten’ imposed by GDPR?

The immutability property immediately comes up to mind as affecting scenarios where personal information is intended to be saved in ledger. However, transactions can’t be removed from the ledger, which is the main goal of the blockchain technology. Alternative approaches could allow saving personal data off-chain, and then using blockchain for pointing out and validating certain information. Though possible, this is rather complex and some core blockchain benefits could be lost such as data ownership or transparency. With this in mind, GDPR could seemingly limit the kind of information that could be stored. It would affect public blockchains more than private or consortium blockchains due to the fact that they do not have permissions.

Hyperledger Fabric is one of the most popular distributed ledger technologies which has recently started to tackle this issue. Currently, permissions between network members regarding data access have been managed through channels, which are unique subnets designed for committing transactions confidentially. However, channel management requires heavy support, thus restricting the number of channels within the same network. Due to this, Fabric developer team recently included the term ‘private data’ which enables the creation of data collections and sets access control policies for the stakeholders. Furthermore, this feature aims to directly control data access rights, defining which attributes might be public or private and who can access the distributed network.

A distributed app (D-App) related to the aviation industry could demonstrate the huge benefit of private data feature. As one example, we can imagine a use case with an airline (data owners) and several data consumers, all participating in a private Hyperledger Fabric blockchain network for managing dataset policies. The airline could perform different analytics over the same dataset but apply some restrictions to which information is visible for each stakeholder in order to keep the data secure. A potential use case of this could be a broader dissemination of encrypted information about connecting passengers that could be validated and used for some prioritization mechanisms that improve passenger connectivity.

According to GDPR, the airline must only provide the data consumers with the specific variables needed to carry out the potential distributed app. For this, the airline has private data collection in Fabric under these specifications, providing certain variables to data consumers involved in the D-App development. They create a subset named Flights_analyst with variables only readable by analysts, isolating the personal information within a regardless collection named Flights_sensitive containing all sensitive data. The authorized apps will retrieve the information hashed from the channel ledger and the real data from the private database in case they are allowed to access it.

Additionally, the airline may not want the data to be available forever, so they decide to put a blockToLive counter, limiting read access to a specific period of time. Once the specified number of blocks is exceeded, the private database is removed.

By using private data functionality, we can affirm that the distributed application complies with GDPR regulations, first through limited consumer access to usage of data, and secondly because the data will be available only when it’s needed before being purged after a fixed number of blocks. This is actually an automatic implementation for the ‘right to be forgotten’ principle written in GDPR. Since nothing sensitive remains in the ledger, the hash will only be relevant so long as the information is available within the collection database. This way, private data functionality helps manage data privacy in distributed ledger environments without the need for third-party involvement.

To sum up, the architecture presented in this scenario relies on multiple conditions that must be met beforehand to comply with GDPR principles. Therefore, all network members should not have malicious intentions while using the data. This approach aims to manage who has access to read specific data. Once the data is shared with a participant, blockchain technology can do nothing about its usage or block it from being shared with third parties. Nevertheless, this can be solved off-chain, defining strict data protection policies in the case of malicious member usage of data.

© datascience.aero