Introduction

Monte Carlo Data is a data observability platform that enables organizations to monitor the quality of their data assets. Founded in 2019 by Barr Moses and Lior Gavish, the company has raised $236 million and has hit unicorn status, with their Series D valuation sitting at $1.6 billion. Moses was inspired to start the company by frustrations she experienced as a data consumer at Gainsight, a product that enables companies to successfully manage customer relations. After speaking to 80 other data engineering teams, Moses was able to confirm that her problems were shared by a variety of organizations. Convinced, Moses raised a seed round to build Monte Carlo Data. Around this time, she published a Medium article “The Rise of Data Downtime”, detailing the problems she faced and coining a term for the problem she is trying to solve. Today, Monte Carlo Data sells their platform to some of the largest data-driven companies in the world, with more than 150 customers. This article provides a brief overview of data observability and Monte Carlo’s primary features.

What is Data Observability?

The amount of information used to drive business decisions is growing. As organizations utilize more data, they require more complex management systems for this data. A modern data infrastructure ingests data from a variety of sources, performs many transformations on this data, stores this data in a data lake or warehouse, and ultimately provides this data to a variety of BI tools and serving systems. Failures at any point in this system can introduce quality issues that can have major business impacts. Data observability is the monitoring of an organization’s data assets with the purpose of quickly identifying issues and managing their resolution.

A data observability tool automatically connects to your data infrastructure, discovers the set of assets in your deployment, monitors these assets for issues, and raises incidents when issues occur. When monitoring a data asset, a data observability tool considers characteristics like:

Volume: Anomalies in the number of table rows.
Freshness: Anomalies in the update patterns for a table.
Content: Anomalies in the data’s values.
Schema: Anomalies in the table metadata.

When a monitor triggers, the data observability tool will raise an alert to the appropriate output channel. Then, the tool may provide additional infrastructure for resolving and understanding issues. For instance, the tool may analyze the lineage of assets to better understand the impact of a data quality issue. Or, the tool may indicate exactly which reports are affected by a data quality issue. The goal of the system is to ensure that the data engineering organization is properly equipped to resolve these issues as they arise, and to reduce the frequency with which they occur.

Monte Carlo Features

Assets

The first thing Monte Carlo (MC) does is construct a searchable catalog of all of the data assets (tables, views, etc.) in the warehouse or lake that you’ve configured. For each of these assets, MC provides a detailed view with a variety of information:

In addition to cataloging the assets, MC is also able to compute table and field level lineages if it has access to your model management software (e.g. dbt). This lineage provides visibility into how each field in a table is derived. When an issue occurs, the lineage is useful for determining how the issue will propagate downstream. MC also uses the lineage to compute an “importance score” for fields and tables, based on the extent to which fields are used. When issues occur, they are ranked based on the importance score, making it easier for data engineering teams to prioritize problems. The MC UI provides a lineage view:

Monitoring

A monitor is an assertion on table state that triggers an incident when the monitor conditions are invalidated. MC’s primary monitoring styles are:

Automatic: Freshness, volume, and schema monitoring is performed on every asset.
Opt-In: A variety of provided monitors that users can enable.
Custom: User-provided monitoring rules.

The opt-in monitoring provides options for field health, dimension tracking, and JSON schema validation:

Custom rules allow the user to specify an arbitrary SQL query that validates a characteristic of the data. MC coordinates the evaluation of all of these rules, and triggers an incident when a monitor is violated.

Incidents

MC creates an incident when a monitor is triggered. An incident is assignable to a member of your data team, has a lifecycle, and can be classified based on severity. The incident dashboard also displays useful data about the incident, such as the affected tables and all external notifications that were triggered. Additionally, the incident page also provides a log of all queries that were executed against affected tables during the problematic period. This can be useful for determining affected systems that are not tracked in Monte Carlo.

Conclusion

Monte Carlo Data provides a data observability platform that is well-featured, but it is still a bit of a mystery to me how they have become so dominant in an extensively competitive field of players. I suspect that a massive part of their advantage is brilliant go-to-market and marketing strategy, which I admire and would love to emulate some day. It will be interesting to see how the observability market plays out in the future, for companies offering products today.