Using data normalization to fix opaque and inaccessible data
July 9, 2021
Data is useless in a vacuum. All the facts and statistics in the world won’t do you any good if you can’t access them easily or understand them within their broader context. No matter how much info you’ve got, if you can’t effectively parse it, it’s no good.
This is the state of most government data: locked away in opaque and inaccessible silos scattered across ministries and organizations and lacking any system of unique identifiers that would allow users to cross-reference. It is possible today to cross-reference governmental data, but only with a huge investment of time and grunt work.
We’re sitting on an almost inconceivable amount of data and no good way to put it to use. Even if you know what you need, where do you even begin looking?
The underlying issue is that data isn’t being shared. By opening these data silos, you can normalize your data with a single, unified, and easily navigable dataset.
And it’s not even that hard to do, provided you know what you’re doing.
Here’s an example of how the current siloed system can better utilize its data: BC's natural resources agencies have various data sources referring to the activity of industrial facilities and their impact on the environment, but no unique facility ID or data warehouse to cross-reference information.
Here’s how that information is currently organized:
All this information is closely linked, yet to gather it all you need to use multiple sites and services. When the puzzle is broken into so many complicated pieces it’s impossible to see the big picture and make informed choices.
A single, normalized dataset would gather all this information in one place, then make it easy for anyone, governmental or public alike, to find the information they’re looking for. It offers a myriad of benefits:
A normalized dataset empowers evidence-based decisions: it’s easy to lay your hands on the statistics and data that are relevant to you and filter out the stuff that isn’t. Plus, when data is easy to access, opportunities become easy to spot. When you use these insights to guide the shaping of long-term goals and the strategic use of resources, efficiency rises.
This system naturally creates clarity of purpose because the public knows what is being done, and more importantly, why. The public has a right to know what projects are being proposed in their backyard, which have been approved, how those projects are going, and how these developments might impact them. Transparent decisions let the public see the value and engage with government projects.
More transparency can also mean less overhead and waste. In 2018, proactively disclosing information led to a 60% decrease in Freedom of Information requests. (Processing the average FOI request costs $3000, resources that can be more efficiently spent elsewhere.)
An informed public is a confident and empowered one. Data transparency is not only more efficient—it also builds trust.
The best part about normalizing data sets is that you only have to do it once.
Once in place, the system is easy to maintain, because all new incoming data is formatted the same way. As much as 80% of the upfront work in creating a unified dataset is applying a naming convention for data in different silos, creating consistent identifiers and keywords between databases to make cross referencing easy.
But once normalization is complete, you must assign unique IDs to unique or new entities. Otherwise, the data normalization algorithm may not be accurate once when new data comes in.
When we gave an environment data example in the previous section, unique IDs across facilities was the “missing piece” preventing the benefits of data normalization:
The main challenge in cracking open information silos isn’t anything technical — it’s communication. An unobstructed and regular flow of information between government ministries is essential. Just as you may need data from another area to inform your decisions, the data you’ve gathered could prove invaluable to someone else.