Databricks is adding generative AI capabilities via Mosaic AI across its data and AI platform, up its data warehousing game and get more out of data via business intelligence tools. The launches come a week after Databricks acquired Tabular.

At its Databricks Summit, which arrives a week after Snowflake's customer conference, the company laid out a strategy that revolves around tightly coupling data management and AI. What Databricks is building now are the tools to add governance to data and AI and deliver real insights.  The idea is that large language models (LLMs) will allow you to interact with data with simple queries.

In a briefing, Databricks officials noted that about 85% of the generative AI experiments in enterprises fail to make it to production. The problem in a nutshell revolves around cost at scale, data privacy and getting the right answers out of models. It is not the model, but what model delivers the best results for your data.

If you zoom out, the battle between Snowflake and Databricks boils down to this: Snowflake needs to prove it can evolve from a data warehouse and data management vendor to an AI platform. Databricks is a data and AI platform that can also offer data warehouse and business intelligence capabilities. Nevertheless, Constellation Research analyst Doug Henschen boiled down the race. "Databricks is leading on AI and genAi, but It has a lot to prove on data warehousing and is behind on data marketplace and data apps," he said. 

With generative AI, Databricks is creating a Data Intelligence Platform that includes Delta Lake, a unified data storage system, Tabular, which will bridge Databricks with the Iceberg crowd, the generally available Unity Catalog, and Mosaic AI, Databricks SQL, dashboards and other tools. Databricks Data Intelligence Platform will be 100% serverless.

The vision and Databricks strategy

During a keynote, Databricks CEO Ali Ghodsi said data and AI is converging. "In the last 18 months, every CEO from a Fortune 500 company or small company I've talked with thinks that data and AI is going to be super strategic for them over the next five years. They think that that's how they're going to win," said Ghodsi. "That's going to be the main differentiating factor whether it's the financial sector, retail,  media, healthcare or in the public sector. Doesn't matter, all of it. It's going to be data and AI."

Ghodsi said that the last 18 months has only increased the pressure to bring use cases into production. He said: 

"There's a food fight inside organizations over who owns an AI. That's number one. Number two, everybody's worried about security and privacy of their data with genAI. They are worried about security and privacy for the whole data estate. And that data estates today is super fragmented."

Ghodsi said the fragmentation can be solved by storing data in open formats so enterprises don't hand data to vendors that'll only lock them in. "Our vision starts with the lakehouse. We said stop giving your data to vendors. It doesn't matter if it's a proprietary data warehouse in the cloud, Snowflake or Databricks. Don't give it to us either."

Databricks acquisition of Tabular was designed to create a USB-like standard for data. Ghodsi said interoperability between Delta Lake and Iceberg will solve a lot of enterprise problems. He added that Databricks will work with the communities in Delta Lake and Iceberg to bring formats together over time. He likened data lakehouse formats as a Betamax vs. VHS type of challenge. 

The other broad theme from Databricks was combining its Lakehouse platform with Mosaic AI. Data intelligence will enable enterprises to ask common questions with genAI and create a AI-meets-business intelligence format. "That's what the whole company is working on," said Ghodsi. 

Nvidia CEO Jensen Huang also showed up at the keynote with Databricks CEO Ali Ghodsi and delivered a few interesting nuggets:

  • Huang argued that open source LLMs "were probably the most important events this year" because it enables enterprises to better leverage genAI. 
  • GenAI will have the most impact on customer service. "Customer service represents probably several trillion dollars worth of expenses and every company is deciding between the chatbot or the customer service agent. It is partly about the fact that you could automate but it's mostly about the data flywheel," said Jensen. "You want to capture the engagement for the data flywheel. We're going to have proactive customer support."
  • AI factories shouldn't be built near populations where the energy grid is already challenged. "Earth has a lot more energy," said Huang. "It's just in the wrong places."

News overview

The news out of Databricks Summit adds up to building out this data and AI platform that's accessible. Here's a look at a few of the announcements.

  • Mosaic AI Agent Framework is in preview and includes tools to build an agent application, evaluate at them with humans and LLM judges and deployment tools with real-time APIs.
  • Mosaic AI Model Training to fine tune small open-source models. Databricks said customers in private preview can use smaller models and reduce costs while reducing latency. Some customers are seeing 10x improvements in inference costs and 2x improvements in latency.
  • A text-to-image model trained on Shutterstock data by Mosaic AI.

  • Support for usage tracking, rate limits and guardrails as well as hooks into Unity Catalog.
  • Databricks SQL is now 70% faster and Databricks showed comparisons for its price/performance vs. Snowflake
  • Databricks AI/BI with dashboards generally available. A tool called Genie so you can query data is in public preview. To date, Databricks hasn't tried to tackle BI use cases, but going forward BI will be a focus.

  • Genie will learn from your data and semantics and feature an ensemble of AI agents to leverage Unity Catalog metadata. Genie will also query data across all workloads and related assets, remember and learn and seek clarifications.
  • Lakeflow Connectors will ingest data from SaaS applications and databases. Lakeflow won't have all of its components right away, but Databricks said there will be a steady cadence over the next 12 months. The aim is to simplify the data engineering process.
  • Unity Catalog OSS, which will be an open catalog that's available now and combines data and AI with interoperability across formats, open APIs and governance.
  • Enhanced Federation that includes Lakehouse Federation to connect data sources to Unity Catalog with policies and Hive Metastore Federation, which can read/write for internal or external Hive Metastore or AWS Glue.
  • Secure Collaboration via Clean Rooms and Foreign Catalog Sharing.
  • Business Metrics, which will pull from your lakehouse assets, leverage a central inventory of certified metrics and make them accessible.

The launches across Databricks Summit have a heavy dose of combining genAI and data warehousing as well as its usual data engineering fare. The case Databricks is making is that it can consolidate your data platforms and silos. In the end, Snowflake and Databricks will compete for customers. 

Constellation Research analyst Holger Mueller said:

"CxOs always need to remind themselves where the vendor came from. Snowflake came out of the data warehouse and showed it could add cloud elasticity. A large part of Snowflake's success was the familiarity with data warehousing. Databricks lived in the big data and cloud world from its inception in a model less familiar to CxOs. If Snowflake manages to add good enough lakehouse capabilities soon, it will win as it has the transactional data. If Snowflake is slow or fumbles, it's Databricks' game to win because the ability to master large amounts of unstructured data is the harder engineering challenge and Databricks has mastered it."

More Databricks: