Natural language processing

Vidu, China’s new text-to-video AI, excels at generating pandas and dragons

Vidu, China’s new text-to-video AI, excels at generating pandas and dragons



summary
Summary

Shengshu Technology and Tsinghua University unveil Vidu, their first Sora-like AI model for text-to-video creation, but it still falls short of OpenAI’s impressive video debut.

Chinese AI company Shengshu Technology and Tsinghua University unveiled Vidu at the Zhongguancun Forum 2024 in Beijing. Vidu can create a 16-second HD video at 1080p resolution with a single click, and is “very close” to the level of OpenAI’s Sora model, according to Shengshu Technology.

Compared to Sora, Vidu is supposed to better “understand and generate Chinese elements such as the panda and dragon”, a claim that has yet to be proven in practice. Shengshu Technology also says that the core architecture of the model was developed in September 2022, before the launch of Sora, China Daily reports.

Video: Shengshu Technology via Reddit

Ad

Ad

Despite the confidence of its developers, the quality of Vidu seems to lag Sora. The most significant difference is that while Sora can generate continuous videos of up to one minute, Vidu currently only manages 16 seconds.

Although Shengshu Technology promises “exceptional consistency” within these scenes, meaning that the individual images build on each other logically, Vidu is still far from matching Sora’s capabilities. One reason could be the limited access to GPUs in China compared to OpenAI.

With Vidu, however, China is demonstrating its serious ambitions to catch up with or even surpass leading US companies such as OpenAI in the race for generative AI models. This will require a significant increase in performance.

Sora is expected to be released this year, with OpenAI planning to scale the model further. Details on pricing and generation times are not yet known.

Vidu, China's new text-to-video AI, excels at generating pandas and dragons

Source link