Which is better suited for your organisation — data mesh or data lake? — Vatsa Solutions Blog

What data lake actually is

A data lake is a large repository of raw data stored at low cost — originally on Hadoop clusters, now typically on object storage like Azure Data Lake Storage Gen2 or AWS S3. The original vision was "store everything, figure out the schema later." In practice, the best implementations add structure: the medallion architecture with bronze (raw), silver (cleansed), and gold (business-ready) layers gives you the flexibility of a lake with some of the reliability guarantees of a warehouse.

The business case for a lake rests on three properties: low storage cost, schema flexibility, and support for diverse workloads — batch analytics, real-time streaming, machine learning model training, and ad hoc exploration — from a single storage layer. Modern lake implementations on Azure Synapse, Databricks, or AWS EMR with Delta Lake or Apache Iceberg have substantially closed the performance and reliability gaps that once made warehouses the default for structured analytics.

What a lake does not provide — without deliberate investment — is data quality, discoverability, and clear ownership. A lake without a data catalogue, without lineage tooling, and without governance conventions becomes a data swamp. This is not a technology failure; it is a governance failure. The distinction matters because it has implications for how you fix it.

What data mesh actually is

Data mesh, as defined by Zhamak Dehghani, is an organisational approach to data — not a technology stack. It has four principles:

Domain ownership. Data is owned, produced, and maintained by the business domain closest to it — not a central data team. The healthcare operations domain owns the clinical encounter data; the finance domain owns the GL data.
Data as a product. Each domain produces data products — named, versioned, documented, and held to SLAs — not just raw exports. Consumers treat them as a product, with expectations that imply accountability.
Self-serve data infrastructure. A central platform team provides the infrastructure and tooling that makes it practical for domain teams to build and publish data products without deep data engineering expertise.
Federated computational governance. Standards and interoperability requirements are set centrally; implementation is the domain's responsibility. Think of it as policy-as-code applied to data quality and access control.

The important implication: data mesh does not tell you what technology to use. A domain could serve its data product from a Postgres replica, a Parquet partition on ADLS, or a Delta Live Tables pipeline on Databricks. The architectural pattern is the contract, not the implementation.

Why the debate is often unproductive

The "data mesh vs data lake" question conflates two separate decisions: architecture and operating model. They need to be answered independently.

The architecture question — lake vs warehouse vs lakehouse — is a workload question. What types of queries? What latency requirements? What team skills? What existing investments? Databricks on ADLS is the right answer for some organisations; Snowflake is right for others; Synapse Analytics is right for others still. The workload determines the architecture.

The operating model question — centralised vs domain-owned data — is an organisational maturity question. Can your domain teams take on data product ownership? Do you have the platform engineering capacity to build a self-serve infrastructure? Does your organisation have the governance discipline to enforce federated standards without centralised control? These are questions about people and process, not technology.

Most organisations that "adopt data mesh" end up implementing a partial version — domain ownership in name, with centralised governance in practice. This is not failure. It is an intermediate state on a journey that takes years, not months. A central data team acting as a platform team while domains gradually take on product ownership is a reasonable transitional model.

A practical way to choose

We use four questions to structure this decision with clients:

What is the primary workload? Ad hoc exploration and ML training favour a lake. High-concurrency BI with strict SLAs favours a warehouse. Mixed workloads favour a lakehouse. Answer this first — it narrows the technology question significantly.
How mature are your domain teams? Data mesh works best when domain teams already understand the data they produce. Teams that are still building that understanding will struggle to own and operate a data product. An intermediate model — central data engineering with domain input — is often more realistic.
What does your governance model need to be? Regulated industries often need a stronger centralised governance layer than pure data mesh envisions. Data quality, lineage, access control, and audit trails are easier to enforce centrally — but they can be federated if you invest in the right policy tooling.
What is already in place? If your organisation has three years of investment in a Snowflake warehouse, the question is how to extend it — not whether to replace it with a lake. Sunk cost logic aside, institutional knowledge and data quality already embedded in the warehouse have real value.

The organisations that get this right typically start with a clear technology decision (lake or warehouse, based on workload) and then gradually introduce domain ownership principles as team capability matures — rather than trying to transform both at once.

Which is better suited for your organisation — data mesh or data lake?

What data lake actually is

What data mesh actually is

Why the debate is often unproductive

A practical way to choose

Cloud-native vs cloud-agnostic — which one is right for you?

No-code AI — redefining application development.

What is next — five biggest technology trends of 2023 to look for.

Want to go deeper? Talk to the author.