OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long.
Based on four sample videos that OpenAI shared with MIT Technology Review ahead of today’s announcement, the San Francisco–based firm has pushed the envelope of what’s possible with text-to-video generation (a hot new research direction that we flagged as a trend to watch in 2024).
“We think building models that can understand video, and understand all these very complex interactions of our world, is an important step for all future AI systems,” says Tim Brooks, a scientist at OpenAI.
But there’s a disclaimer. OpenAI gave us a preview of Sora (which means sky in Japanese) under conditions of strict secrecy. In an unusual move, the firm would only share information about Sora if we agreed to wait until after news of the model was made public to seek the opinions of outside experts. [Editor’s note: We’ve updated this story with outside comment below.] OpenAI has not yet released a technical report or demonstrated the model actually working. And it says it won’t be releasing Sora anytime soon. [Update: OpenAI has now shared more technical details on its website.]
To read more, click here.