rewrite this content using a minimum of 1000 words and keep HTML tags
Tencent has introduced Voyager, an impressive new AI model that can transform a single photograph into a three-dimensional scene. The model simultaneously generates both an RGB video and depth information, offering a powerful approach to 3D reconstruction without the need for traditional modeling techniques. However, it requires a significant amount of hardware to run effectively.
How Voyager Works

The HunyuanWorld-Voyager model takes a single image and a user-defined camera path—such as a pan, tilt, or dolly-in motion—to generate a short video. It produces both the video and a simultaneous depth map, ensuring that the spatial relationships of objects in the scene remain consistent. The system maintains geometric coherence by comparing each new frame with the previous content using 3D point clouds. However, distortions can still occur with long or complex camera movements, particularly with 360-degree rotations.
Tencent‘s technical report highlights an additional component called the “world cache,” which stores data from each new frame. This allows for data reuse in subsequent frames, significantly preserving geometric consistency over videos that are several minutes long.
Training and Requirements

Voyager was trained on a massive dataset of over 100,000 real and synthetic video clips, including scenes from Unreal Engine environments. This extensive training helped the model understand various camera movements. The training process used an automated depth estimation method, eliminating the need for manual labeling.
While technologically powerful, Voyager has high hardware requirements. Running the model at a 540p resolution requires 60 GB of GPU memory, and optimal results need 80 GB. The system supports multi-GPU scaling, with an 8-GPU setup running approximately 6.7 times faster than a single GPU. The model weights have been made available to researchers on Hugging Face.
Voyager vs. Other AI Models
Voyager’s approach sets it apart from existing video generation models. Unlike OpenAI’s Sora, which focuses on visual realism, Voyager prioritizes geometric consistency between frames. This focus helped it achieve a top score of 77.62 on Stanford’s WorldScore benchmark, outperforming competitors like WonderWorld and CogVideoX-I2V. However, it still has some limitations in precise camera control.
Additionally, there are some licensing restrictions for Voyager. Its use is prohibited in the European Union, the United Kingdom, and South Korea. Commercial applications serving over 100 million active users require an additional agreement.
You Might Also Like;
Follow us on TWITTER (X) and be instantly informed about the latest developments…
Copy URL
and include conclusion section that’s entertaining to read. do not include the title. Add a hyperlink to this website http://defi-daily.com and label it “DeFi Daily News” for more trending news articles like this
Source link