How to Effectively Leverage Big Data to Maximize Citizen Impact
July 9, 2021
When collecting metrics or analyzing data, the first question should be "how is this actionable?"
I am currently the technical lead on a collaboration between Button and the BC government's Climate Action Secretariat. For the past two years, we have been untangling a decade of greenhouse gas emissions data and developing better data collection and analysis tools to support the CleanBC program.
Here’s how to effectively leverage large swathes of data to maximize its impact.
“Data-driven decisions” are only possible if your data is already actionable.
Before we started working with the Climate Action Secretariat, our client’s team understood that the greenhouse emissions data was one of their main assets. The initial challenge was understanding that data in a timely way: because of its unwieldy format, it took months for analysts to extract any useful information, making it difficult to action.
In this case, the process started to change when an executive acted and brought in our development team to improve the data quality. We worked collaboratively to shape the process from there.
A critical factor for success with the Climate Action Secretariat was opening communication channels between the development team and subject experts early on. An agile software development methodology allowed us to quickly iterate on improving data quality. This means that we have a subject matter expert dedicated to helping the development team, and we have a dedicated communication channel where analysts and other consumers of the data can ask our team technical questions.
Because all this communication is making the analysts’ work easier, it leads to a gradual shift in how the data is used. For instance, analysts began using business intelligence tools connected to a sole source of truth, rather than having to search out various copies of the data scattered across excel spreadsheets.
When we helped develop the CleanBC Industrial Incentive Program, a grant application for greenhouse gas emitters, we used a variety of data to draw empirical evidence to prioritize our work and define how the application should be improved. Before leveraging this data, it needed to be qualified.
An example: Over the previous grant cycle, analysts recorded various data entry issues. By studying this data, we determined which advanced validation features to implement for the greatest impact in the next grant cycle. Based on that, we developed features to automatically catch data entry errors and prompt the applicant to update it. This new feature not only saves time for analysts, but it also improves the consistency of future data as it is collected.
No matter what tools or techniques you use to make data-driven decisions, the quality of your data is paramount — your decisions can only be as sound as the data they are based on. We improved data consistency based on empirical evidence, but another aspect of quality is timeliness: if it takes months to process and analyze data, it will be extremely hard to make effective decisions based on it. Other aspects of data quality may include completeness, accuracy, and relevancy.
There are many ways you can improve your data, like normalization, which means reducing data redundancy and ensuring you have a sole source of truth, and baking metadata — essentially, information about what the data is — into the data itself, to make analysis and knowledge transfer easier.
Quality data is complete, accurate, and timely.
The more broadly you can share data, the more valuable it is. This is where data transparency is essential. An initiative we see in many governments is open data. Even if your data cannot be openly published, providing free, seamless access to the data in a controlled environment — within a ministry, for instance — is a big win.
One of the first tools deployed when we started working with the Climate Action Secretariat is an open-source business intelligence tool called Metabase. We created accounts for all the employees that allowed them to look at the data, even if they are not data scientists and do not know how to code. This allowed everyone to share and curate useful queries and dashboards, which not only improves data literacy and data quality, but also promotes a collaborative and open culture.
The final aspect of data quality is relevancy, and qualitative data analysis is a useful tool to ensure the data you collect is relevant. Surveys with open-ended questions and interviews are keys to ensure that the analysis of the data reflects reality.
This mistake — gathering data in ways that might not reflect the reality of a situation — is common. For example, in software engineering research, many studies provide findings based on quantitative data alone. These studies might draw correlations between a set of metrics, but they fail to back up those findings by interviewing the subjects of the research, which leads to data that lacks relevance. (This mistake is so common that I was guilty of it in academia too.)
A practical guideline whenever you are analyzing data is this: "remember to set goals before looking for Key Performance Indicators."