Helping nonexperts build advanced generative AI models

The impact of artificial intelligence will never be equitable if there’s only one company that builds and controls the models (not to mention the data that go into them). Unfortunately, today’s AI models are made up of billions of parameters that must be trained and tuned to maximize performance for each use case, putting the most powerful AI models out of reach for most people and companies.

MosaicML started with a mission to make those models more accessible. The company, which counts Jonathan Frankle PhD ’23 and MIT Associate Professor Michael Carbin as co-founders, developed a platform that let users train, improve, and monitor open-source models using their own data. The company also built its own open-source models using graphical processing units (GPUs) from Nvidia.

The approach made deep learning, a nascent field when MosaicML first began, accessible to far more organizations as excitement around generative AI and large language models (LLMs) exploded following the release of Chat GPT-3.5. It also made MosaicML a powerful complementary tool for data management companies that were also committed to helping organizations make use of their data without giving it to AI companies.

Last year, that reasoning led to the acquisition of MosaicML by Databricks, a global data storage, analytics, and AI company that works with some of the largest organizations in the world. Since the acquisition, the combined companies have released one of the highest performing open-source, general-purpose LLMs yet built. Known as DBRX, this model has set new benchmarks in tasks like reading comprehension, general knowledge questions, and logic puzzles.

Since then, DBRX has gained a reputation for being one of the fastest open-source LLMs available and has proven especially useful at large enterprises.

More than the model, though, Frankle says DBRX is significant because it was built using Databricks tools, meaning any of the company’s customers can achieve similar performance with their own models, which will accelerate the impact of generative AI.

“Honestly, it’s just exciting to see the community doing cool things with it,” Frankle says. “For me as a scientist, that’s the best part. It’s not the model, it’s all the amazing stuff the community is doing on top of it. That's where the magic happens.”

Making the construction of LLMs more accessable to the general community is a good thing.

To read more, click here.