Databricks today launched what it calls its Lakehouse Federation feature at its Data + AI Summit. Using this new capability, enterprises can bring together their various siloed data systems and discover, query and govern their data across a wide variety of platforms, including MySQL and PostgreSQL databases, as well as Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse and Google’s BigQuery, with the governance features powered by Databricks’ Unity Catalog.
“[Lakehouse Federation] is this effort to expand our platform to easily manage and query data in other systems as well,” Databricks co-founder and chief technologist Matei Zaharia told me. One of the core features of this new capability is query federation, he explained, which allows users to connect different data sources and query them efficiently, all while essentially seeing that as a standard database inside of Databricks.
Often, a company may have real-time data in a PostgreSQL database that powers an app, but an analyst may want to combine this with historical data from a data warehouse and query across both systems. Using Lakehouse Federation, Databricks can now handle the query planning for this (and cache data as needed to keep the system performant).
Ideally, of course, Databricks would like everyone to use its platform, but the reality is that even though enterprises want to simplify their infrastructure, it’s very hard to move data platforms. “This allows you to at least have a single interface for users and a single place to manage that,” Zaharia explained. Often, companies try to build a system like this in-house, which tends to be costly and complicated (and often fails).
Zaharia also noted that Databricks has an interesting advantage here because its product is built on Apache Spark — and the Spark open-source ecosystem includes a wide variety of connectors, which Databricks can then use to build a product like Lakehouse Federation without having to rebuild many of the core integration tooling.
One advantage here is that Databricks is also layering its data governance features on top of this, allowing companies to more easily manage access to their data across platforms. That’s something Microsoft has long bet on with its Purview governance solution, too, for example. Now more than ever, data governance is something enterprises are keenly focused on.
“We’re giving organizations access to all of the data they need through one system, which will lead to more innovation — and the best part about that innovation is that it doesn’t sacrifice security. By enabling customers to easily apply the rules consistently across platforms and track data usage, we’ll help them meet compliance requirements while pushing their businesses forward,” said Zaharia.