Building a Cloud Data Platform from the Ground Up - Lessons Learned as a Data Engineer

Wazir Rohiman
October 9, 2024

Less than three months ago, I was assigned a team of data engineers to build an internal cloud data platform at Calybre. I didn’t fully grasp the sheer scale of challenges — and opportunities — I’d encounter. I quickly realised that this wasn’t just about writing a bunch of scripts and deploying infrastructure. It’s about orchestrating security, architecture, automation, and team leadership to build something robust and scalable.

A data platform is the foundation of any analytics, machine learning, and AI initiatives. For some businesses, it’s the bedrock of their data-driven organisation, leveraging their data operationally far beyond typical business intelligence use cases. A data platform on the cloud means that your organisation can benefit from the scalability, flexibility, disaster recovery, and pay-as-you-go pricing models that are generally associated with the cloud. If set up properly, your cloud data platform can offer a centralised, unified environment where data from various sources is collected, processed, and made accessible for a wide range of use cases — from real-time analytics to advanced machine learning models. It becomes a single source of truth, enabling consistent data governance, security, and operational efficiency across the organisation. This ensures that your teams can focus on extracting insights and driving innovation without worrying about the underlying infrastructure.

As a team, we had our fair share of challenges and made some mistakes along the way. All of which helped us better understand what it takes to build a robust data platform on Azure. Here are some of the key lessons I’ve learned through this process:

1. Security Is Everything: Get It Right from Day One

In hindsight, I underestimated how big of an impact security has on anything technology related. Security should never be an afterthought, period. It’s one of those things that, if overlooked, can come back to haunt you — not just in terms of data breaches, but also in cloud costs spiralling out of control. There are two key aspects of security to consider when it comes to a data platform: the most obvious one, as a data professional, would be the data itself. But equally important is securing the cloud resources storing and processing the underlying data.

If you lock down permissions properly from the get-go — like controlling who can create clusters, who can read/write to storage, and defining which identities can access what — you avoid common pitfalls that lead to accidental cost escalation or, worse, leaving your platform vulnerable to attacks and data breaches. On our data platform, we went through a couple of iterations until we got it to a secure level. This is to say: security is not a one-off thing; you constantly need to keep monitoring it as you scale the platform and as your use cases evolve.

Getting this right early on lays a solid foundation for scaling the platform securely.

2. The Core Components: Storage, Compute, and Orchestration

At its heart, every cloud data platform boils down to a few major components: storage, compute, and orchestration.

For storage, it’s important to understand that your decisions here — whether it’s cold or hot storage — can make or break your platform’s efficiency. How you organise your data in storage plays a crucial role in determining whether you’re building siloed data stores or a democratised single source of truth. It can also determine how data access patterns are designed. Storage is ultimately the backbone of your data system – don’t be afraid to spend some time designing it properly before you start building. That includes knowing how your users will be ingesting the data, what format the data is in, and the type of data the platform users intend on processing. Which takes us to the next component: compute.

Calybre is a Databricks Partner. It only made sense to lean on Azure Databricks as our compute solution. Its simplicity allowed us to focus on what really mattered — analytics. Databricks’ Unity Catalog not only provides a layer for data observability and governance, but also allows the data platform team to establish finer grain control and access at catalog and schema level.

Lastly, there’s ingestion orchestration, which is often treated as an afterthought. It’s the glue that connects the user’s data and the platform. A smooth orchestration layer ensures that your data flows seamlessly between storage and compute, enabling analytics and reporting without hitches. On the platform, we implemented an internally built metadata management layer for orchestration leveraging Azure Data Factory and Azure SQL. This allows the teams to have a centralised place within the platform to manage, automate, and monitor data ingestion and relevant metadata.

3. Data Architecture Patterns Are More Than Just Data Pipeline Designs

When we talk about data architecture, the conversation tends to centre on data pipelines. But I’ve learned that architectural patterns extend way beyond just pipelines. For a platform to function like a well-oiled machine, you need patterns for Continuous Integration and Continuous Delivery/Deployment (CI/CD), resource segmentation, storage, and — once again — security.

With CI/CD patterns in place, we’ve been able to automate our deployments, reducing human error and increasing the speed of development. In a cloud environment, where resources are constantly changing, this automation is invaluable. Resource segmentation has also been a key factor in making our platform scalable, secure, and manageable. By segmenting environments and resources, we’re able to control performance and compute allocation, as well as monitor costs for each environment.

4. Infrastructure as Code: Code Is Not the Difficult Part

Terraform code can be beautifully complex. However, writing Infrastructure as Code (IaC) in Terraform is the easier part when it comes to building a cloud data platform. But to truly understand how resources interact with each other in an automated workflow? That’s the real challenge.

It’s not enough to know how to spin up an environment with code; you need to understand how different resources depend on each other and how they’ll interact once provisioned. For instance, in our platform, understanding how security, delegated access, storage, and compute services talk to each other was vital in ensuring that the platform works in an automated and secure way.

Initially, we encountered a few hiccups — especially with resource dependencies — and while writing the Terraform script might have been the easy part, troubleshooting how those resources connected to each other was the real learning experience. The takeaway here? Don’t underestimate the complexity of resource interactions and the delegation of security principles in a cloud environment. Having an in-depth understanding of the cloud services you are using is a definitive advantage as a data engineer — regardless of whether you are building a cloud data platform or not.

5. Leading a Data Platform Team: Leadership Lessons from the Field

One of the most rewarding parts of this project has been leading a team to build the platform. As a data engineer, I was more familiar with being arms-deep into the technical parts of a project. But leading a team required a different skill set — one centred around delegation, communication, and trust.

It became clear early on that each team member needed to own a specific part of the platform. Whether it was security, DevOps, or IaC, giving ownership to the right person ensured that every part of the platform was well-attended. As someone leading the team, I made sure to give my engineers the space and time to learn and figure things out. Problem-solving sessions were crucial at the beginning of the project, as we dived deep into many cloud concepts we had never been exposed to previously.

Technical walkthroughs were essential — by bringing the team together to brainstorm and figure out how each piece of the platform worked, we were able to keep everyone aligned and moving in the same direction.

Another key leadership lesson? Balancing business and technical needs. As much as I wanted to build the most technically perfect platform, I had to make sure it aligned with the business’s goals, timelines, and, most importantly, budget. That meant making trade-offs, but it also meant building something that provided real value. Our goal was to prioritise the release of a working platform that allowed the internal data teams at Calybre to dip their feet and experiment with their own project data pipelines.

6. Leverage Your Data Platform’s Data

As with any other technology system, the data platform can be a treasure trove of operational data. If set up properly from the beginning, your data platform will give your team valuable information for future optimisation. You can leverage Databricks’ system logs to monitor compute and cluster utilisation performance. Azure also provides a suite of tools that allow you to monitor costs of resources and services. Cloud costs can become unpredictable if your data platform is not set up properly. Adding additional monitoring capabilities to your cloud data platform development might increase the delivery time of the platform, but having the ability to monitor the platform’s vitals is essential for the long-term success of the data platform.

A Data Platform is Never Truly ‘Done’

Building a cloud data platform from scratch has been an intense but rewarding learning experience. It pushed me to dive deep into areas like cloud security, data architecture, and leadership in ways I hadn't anticipated. The key lesson? This kind of project is about far more than just writing code — it’s about understanding the interplay between technology, security, automation, and business needs.

For any data engineer taking on a similar challenge, my advice is simple: take ownership of your part of the platform, but don’t lose sight of the bigger picture. Understand how all the components — security, compute, storage, and orchestration — fit together, and always consider how your platform can grow with the business. At the end of the day, it’s not just about building something that works — it’s about creating a foundation that will scale, adapt, and add value over time.

Prioritise the fundamentals: security, scalability, and automation, and remember — the platform is never “done.” It’s always evolving and being optimised.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Leave a reply

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get ready for the future.

Need more?

Do you have an idea buzzing in your head? A dream that needs a launchpad? Or maybe you're curious about how Calybre can help build your future, your business, or your impact. Whatever your reason, we're excited to hear from you!

Reach out today - let's start a coversation and uncover the possibilities.

Register for our
Free Webinar

Can't make BigDataLondon? Here's your chance to listen to Ryan Jamieson as he talks about AI Readiness

REGISTER HERE