Orfium’s BI journey: From 0 to Hero

Foreword

This is the story of Business Intelligence and Analytics in Orfium, from before there was a single team member or a team itself within the company, to now, where we have a BI organization that scales, Data goals and excited plans for future projects.

Our story is a long one to tell, and one that makes us enthusiastic to tell. We’ll go through the timeline of our journey to the present day, but we fully plan to elaborate on the main points discussed today in their own articles.

We hope you’ll enjoy the ride. Buckle up, here we go!

Where we started

Before we formally introduced Business Intelligence to Orfium, there were a few BI-adjacent functions at the company. A number of Data Engineers, Operations Managers, and Finance execs created some initial insights with manual data pipelines.

These employees primarily gathered insights on two main parts of the business.

1. Operations Insights – Department and Employee Performance

To get base-level information on the performance of departments and specific employees, a crew of Data Engineers and Operations Managers with basic scripting skills came together, put together a python script, which didn’t lack bugs. The script pulled data from CSV exports from our internal software, as well as exports from AWS S3 that were provided by the DE teams, and connected them to produce a final table. All the transformations were performed within the script. No automation was initially required, as our needs were mostly for data on a monthly basis. The final table would then be loaded on Excel and analyzed through pivot tables and graphs.

This solution provided some useful insights. However, it certainly couldn’t scale along with the organization. A few problems came up along the way which could not be solved by this solution. Not the least of which, our need to have more frequent updates of daily data, to be able to view historical performance, and to join the data with important data from other sources were all reasons why Orfium seeked a bigger, smarter and more scalable solution to BI.

2. Clients Insights

Data Engineers put together a simple dashboard on Amazon QuickSight to give clients insights into the revenues we were generating for them. The data flowed from AWS S3 tables they had created, and they displayed bar charts of revenues over time, with some basic filtering. This dashboard was maintained for a couple of years but ultimately was replaced in March of 2022 with a more comprehensive solution that the BI team provided (spoiler alert: we created a BI team).

A new BI era

In light of some of the issues mentioned above, a small team of BI Analysts was assembled to help with the increasing needs of the Operations team.

The first decision made by the BI Analysts was related to which tools to use for visualization and the ETL process. Nikos Kastrinakis, Director of Business Intelligence, had worked with Tableau previously, so he did a demo and trial with Tableau and ultimately convinced the team to use this as our visualization tool. We also use Tableau Prep as our ETL tool. The company was now storing all relevant data in AWS S3, and the Data Engineers used AWS Athena to create views that transformed the data provided into usable tables that BI could join in Tableau Prep.

During the Tableau trial, the BI team started working on the first dashboard, set to be used by the Operations department, replacing the aforementioned buggy script. We created one Dashboard to rule them all, with information on overall department performance, employee performance, and client performance. This gave users their first taste of the power of a BI team. Our goal was to answer many of their questions in one concise dashboard, complete with historical breakdowns of different types of data, and bring insights that users hadn’t seen before. The Tableau trial ended right before the Dashboard was set to launch. So of course, we purchased our first Tableau licenses for Orfium and onboarded the initial users with the launch of this Dashboard.

It was a huge success! The Operations team was able to phase out their use of the script, stop wasting time monthly to generate reports for themselves, and were exposed to a new way of gaining insights.

Our work with the Operations team didn’t stop there. Over the next many months we continued to work with this data stack. We created further automation to bring daily data to the Operations team so they could manage departments and employees in real-time. But this introduced some new challenges we had to face.

With the introduction of daily estimated data, the frequency of updates and the size of the views made the extracts unusable and obsolete, so we had to face the tradeoff of data freshness VS dashboard responsiveness. Most of the stakeholders were happy to wait 30 seconds more when they looked at their dashboards, knowing that they had the most up-to-date data possible. Operations needed to be more agile in their decisions and actions, so having fresh data was very important for them. To date, members of the Operations team remain the most active users of Tableau at Orfium and have been active participants in other data initiatives across the company.

The reception of these initial dashboards was amazing. The stakeholders could derive value and make smarter decisions faster, so the BI team gained confidence and trust. However, the BI team was still mainly serving the Operations department (with some requests completed for Clients and Corporate insights) but was starting to get many requests from Finance, Products, and other departments. We began to add additional BI Analysts to serve these needs. However, this was just the beginning of the creation of a larger team that could serve more customers more effectively, as we also began improving internal tech features and utilizing external solutions for ready-made software.

Where we are today

We had many questions to clear up:

Where to store our data, how to transform them, who is responsible for these transformations, who is responsible for the ready and delivered data points, who has access and how do they get it, where do we make our analyses, how do data move around platforms and tools, how do our data customers discover our work?

Months and months of discussions between departments on all these questions lead to a series of decisions and commitments about our strategic data plans.

Where we stand now is still a transition from the previous step, as we decided to take a giant step forward, by embracing the Data Mesh Initiative. We’ll have the chance to talk about some of the terms and combinations of software that we’re about to mention in future blog posts, but we can run through the basics right now.

Our company is growing very quickly and, given the fact that we prefer being Data-(insert cliche buzzword here), the needs and requests for BI and Data analysis are growing at double the speed.

The increase in the number of BI Analysts was inevitable with the increasing requests and addition of new departments in the company that needed answers for their data questions.

By hiring more BI Analysts, we split our workforce between our two main Data customers, and thus created two BI Squads.

One is focused on finance and external client requests. We named it the Corporate squad, and it consists of a BI Manager and 2 BI Analysts. This is the team that prepares the monthly Board meeting presentation materials (P&L, Balance Sheets), and the dashboards shared with our external customers so that we can use data to demonstrate the impact of our work on their revenue and so that they get a better understanding of their performance on YouTube. This squad also undertakes many urgent ad-hoc requests on a monthly basis. This squad has a zero tolerance policy for mistakes and usually works on a monthly revision/request cycle.

The second squad is more focused on analyzing, and evaluating the performance and usage of our internal products, and connecting that info with the performance of our Operations teams, which generate the largest portion of our revenue. This squad, which happens to work from two different time zones, again consists of a BI Manager and 2 BI Analysts, and has more frequent deadlines, as new features come up very fast and need evaluation. The nature of data and continuous evolution of the data model results in less robust data.

In the meantime, we had already realized that we had to bulletproof our infrastructure and technical skills before scale gets to us. We decided to have some team members focused on delivering value by creating useful analyses, as described above, but we also reserved time and people who were more focused on paving the way for the rest of the analysts to be able to create more value, more efficiently.

We researched the community’s thoughts on this and we found the term Analytics Engineer, which seemed very close to what we were looking for. We thought this would be very important for the team and decided to go one step further and create a separate role that would be the equivalent of the Staff Engineer for software engineering teams. This role is more focused on setting the technical direction of the department, researching new technologies, consulting on the way projects should be driven, and enforcing best practices within the Analytics Chapter of the Data Unit. Quality, performance, and repeatability are the three core values that the code produced by this department should have.

We currently have a team of 8 people including the BI Director, two squads, each with one BI Manager and two BI Analysts, plus the Staff BI Analyst.

In terms of skillsets, we left python behind. Instead, we’re focusing more on writing reliable and performant SQL code and collaborating on git efficiently as a team. Our new toolkit is also more or less co-decided by the endorsement of a centralized data mesh, which is currently hosted by Snowflake. Nothing is being hosted/processed locally anymore, we develop data pipelines using SQL, apply them to our dev/staging/production environments through dbt and orchestrate the scheduling and data freshness using Airflow. We are the owners of our own Data Product, which is Orfium’s BI layer. It is a schema in Orfium’s production database where we store our fresh, quality, and documented data resulting from our processes. This set of tables connects data from other teams’ data products (internal products, external reports, data science results) and creates interoperating tables. These tables are the base for all our Tableau Dashboards, and help other teams use curated data without having to reinvent the wheel of the Orfium data model on their own again. Our Data product and our Tableau sites with all of our dashboards are fully documented and enriched with metadata so that our data discovery tool Select Star allows stakeholders to search and find all aspects of our work.

The future

Data Mesh was a big bet for Orfium and we will continue to build on it. The principles are applied and we are in the process of onboarding all departments on the initiative to be able to take advantage of the outcomes to the fullest extent. When this hard process is completed, all teams will enjoy the centralized data and the interoperability that derives from it, the Domain-Driven Data ownership, ensuring the agreed levels of quality on data, and it will help Orfium become more data-powered.

In addition to the obvious outcomes of applying the Data Mesh principles in a company, we believe that we need to follow up with two more major bets.

We decided to initiate, propose and promote Data Culture in Orfium. This is a very big project and is so deep that all employees need to get out of their comfort zones to achieve it. We need to change the way we work, to start planting data seeds very early in all our projects, products, initiatives, and working behavior so that we can eventually enjoy the results later on. This initiative will come with a Manifesto, which is being actively written and soon will be published. It will require commitment and follow-up on the principles proposed so that we achieve our vision.

Self-Service Analytics is also one of the Principles that Data Mesh is based on, and we decided to move forward emphatically with this too. Data will be generally accessible on Snowflake by everyone, but Data Analysis requires data literacy, SQL chops, and infrastructure that can host large amounts of data in an analysis. We decided to use Metabase as the proxy and facilitator for Self-Service Analytics. It provides the infrastructure by analyzing the data server-side, and not locally, and its query builder for creating questions is an excellent tool to create no-code analyses. Surely, it is not as customizable as SQL, but it will cover 85% of business users’ needs with superior usability for non-technical users.

This leaves us with data literacy and consultancy. For this, we have set up a library of best practices, examples, tutorials, and courses explaining how to handle business questions, analyses, limitations of tools, etc. At Orfium we always want to take a step further though, and we have been working to formulate a new role that will provide in-depth consultancy on data issues. This role will act as a general family doctor, who you know personally and trust, and will handle all incoming requests on data problems. Even if the data doctor cannot directly help you, they can direct you to a more suitable “doctor”, a set of more specialized experts, each one in their data sector (Infrastructure, SQL, Data Visualization, Analysis Requirement, you name it).

To infinite data and beyond

What a journey this has been over the last 3-4 years for BI in Orfium! We have gone through a lot, from not having official Business Intelligence to a BI team that has plans, adds value for the organization and inspires all teams to embrace the data-driven lifestyle. We’ve done a great job so far, and we have great plans for the future too.

It’s a long way to the top, if you want to rock and roll, ACDC