I can clearly remember that moment when our VP of Engineering came to me saying, “Our data and reporting are a mess.” At that moment, we admitted that we had to improve how we manage and organize data and, even more importantly, how we provide it to our customers.
You know, when a company turns the corner and in a really short amount of time goes from being a start-up to becoming a really important player doing business with big companies, everything needs to be evolved, transformed, and improved. This is fair, and everyone would love to be involved in such interesting processes, including me. But (there is always a but) each new refactoring, each new improvement brings challenges, study, and risks to be taken into account, and this was true when I started to analyze and design a new reporting system, handling both the data — in terms of moving and transformation — and the system to provide them to our customers.
Chapter 1 – What the…?
The first thing I addressed when I started the analysis was the way we stored reporting data and how we were getting the data to build dashboards and reports. Basically, all the information — even the semi-aggregated data — was stored inside the main database together with the operational data. Moreover, the software modules in charge of getting the data and building reports and dashboards were part of the main backend system.
In a scenario where there aren’t many concurrent users and the number of records is not in the hundreds of thousands, this approach — while not ideal — is not necessarily wrong.
Fortunately for us, our number of concurrent users increases every day, and together with them the amount of data we host. This means that we need to completely change our approach to