This three-month safetytech pilot trialled a digital solution, to automatically anonymise and desensitise health and safety data.
million records anonymised
days compared to 12.5 years
Through its regulatory activities in the United Kingdom, the UK Government’s Health and Safety Executive, (HSE), accumulates large volumes of industrial health and safety information from inspections, mandatory reporting, accident investigations and prosecutions. Within its records it contains information on industrial operating experience, how well risks are controlled across workplaces and the causes of serious accidents. Similarly, individual organisations hold a wealth of additional valuable information pertaining to the management and control of their health and safety risks across their workplaces, along with learnings from accidents and near-miss incidents.
Should these sources of health and safety information be combined and analysed, potential exists for improving how we learn from previous incidents, enabling impactful exploitation of the learnings. An obstacle to sharing data in this way is the risk of tracing data back to organisations or individuals, breaching data protection laws exposing commercially sensitive information. In order to be able to exploit this data, HSE sought to automatically redact personally and commercially sensitive information and sources from these documents.
This Safety Accelerator challenge is set in partnership with the Discovering Safety Programme (DSP), an ambitious endeavour to unlock value held in health and safety data, funded by Lloyd's Register Foundation and jointly delivered by the Health and Safety Executive and the University of Manchester, through the recently established Thomas Ashton Institute. The DSP sought innovative, automated techniques to effectively desensitise and anonymise health and safety records. Automated anonymisation of data could ultimately enable a much more diverse set of data to be curated, unlocking much higher value knowledge outputs.
The DSP defined their challenge and in collaboration with the Safety Accelerator and programme partners Plug and Play Tech Center, identified the parameters and types of solutions that would meet the challenge. Nine global safetytech startups were sourced, each with solutions suitable for the trial. After a rigorous shortlisting and competitive selection process, London-based startup Ohalo was chosen to collaborate with the DSP on a three-month funded pilot.
Ohalo automates data governance to meet regulations like GDPR and state data protection laws. Ohalo’s goal is to make data aware of what regulations apply to it. This ensures that the correct data controls are implemented, regardless of whether the data is within their own organisation, or at third party partners.
Ohalo installed their solution, in collaboration with the DSP, at HSE’s premises to evaluate construction RIDDOR data (The Reporting of Injuries, Diseases and Dangerous Occurrences Regulations), which had already been anonymised manually in 2017 and made public.
The Ohalo solution was trained on HSE data and processed almost 2000 RIDDOR documents to identify and redact personal data. Data to be redacted included:
The HSE data science team examined Ohalo’s effectiveness at redacting sensitive information, including the ability to personally identify individuals or entities by ‘joining up the dots’, for instance by linking PII (Personally identifiable information) data, to public data to infer identity. When comparing the manual versus automated anonymisation, some differences were found in the levels of under or over redaction and inconsistencies arising from manual alterations, such as spelling corrections that occurred during the manual anonymisation.
As part of the pilot, Ohalo enabled staff to update the machine learning models with example training information, to facilitate improved redaction in the future. As a visual description, the images below show one of the methods for enhancing the solution.
The Lloyd's Register Safety Accelerator was instrumental in setting up and bringing forward the project with the HSE. We felt from the beginning that our technology could drive real value in anonymising sensitive data from the safety and accident reports that HSE manages, so that the anonymised records can more easily be used by third parties to reduce accidents and fatalities in the UK. However, it was a fairly high-risk project and so the Safety Accelerator was the perfect vehicle for the HSE and Ohalo teams to work together, in iteratively improving the algorithms to get to a point where HSE would be comfortable with sharing the data. From the beginning, LR was able to assist us in managing the project and providing feedback from their experience with other clients. The great work between the three parties resulted in the great outcome we achieved, which is going into production in the near future, hopefully changing the lives of many workers for the better."
Kyle DuPont, CEO, Ohalo Limited
The performance of Ohalo’s solution was critical to its usefulness in the Health and Safety industry. Performance during this pilot was measured in the following ways
- Accuracy – of the 1998 total documents evaluated, 94 records retained PII, and only 19 were considered to be a significant breach of GDPR. This shows that the Ohalo solution is capable of anonymising HSE RIDDOR records with an accuracy of 99%.
- Speed – HSE have over 600k RIDDOR reports and over 1m other different types of documents. Ohalo’s technology allows HSE to anonymise these large archives at a speed, accuracy and cost not currently possible using manual processes. In general, it takes around 0.2 seconds to redact a single document using Ohalo. This means that automated anonymisation of the original 600k document set, removing 99% of PII, is possible within just 1.4 days with Ohalo compared to 12.5 years when carried out manually. Ongoing monthly processing time for 10k newly received documents is around 33 minutes, using the Ohalo solution.
Ohalo anticipate that refining the solution further and resolving the outstanding issues could reduce Significant Breaches, thus reaching an accuracy of 99.5% anonymisation.
The pilot demonstrated that it is possible to successfully redact personal and sensitive health and safety data from unstructured data sets, to a very high degree of accuracy. Using this anonymised data, HSE will be able to share information with third party researchers, to further their mission of preventing death, injury and ill health to those at work and those affected by work activities.
There is no doubt that the sharing of HSE data to create large datasets from which we can gain insight will lead to improvements in safety. An obstacle to achieving an environment for sharing of HSE data is the risk that companies or individuals could be identified, especially considering the nature of the data about an individual’s injuries suffered at work. Such breaches not only compromise the rights and freedoms of individuals, but they can also lead to consequences for the originating companies.
The solution deployed and developed during this pilot has the potential to be used in wider industry:
- Enabling data to be shared in a GDPR compliant way
- Enabling organisations to receive Health and safety data from industry partners and other subject matter experts in a GDPR compliant way
- Allowing organisation and categorisation of documents with machine learning, to prepare large data sets for more effective analysis in the future.
"With the support of the Lloyd's Register Safety Accelerator, the Health and Safety Executive were able to work with Ohalo to anonymise highly sensitive incident reports, so they could be more easily used for research. We developed a process to evaluate the accuracy of the automatic anonymisation to understand the performance of the process. We are now working on the anonymisation of other HSE and third-party datasets to facilitate future research to better understand and address the real-world risks in the United Kingdom and beyond."
Data Science and Software Team Lead, HSE Science and Research Centre.
What we think
LR's experts regularly share their research and insights.