rewrite this title AI That Turns Photos into 3D Worlds: Tencent Voyager

Listen to this article

rewrite this content using a minimum of 1000 words and keep HTML tags

Tencent has introduced Voyager, an impressive new AI model that can transform a single photograph into a three-dimensional scene. The model simultaneously generates both an RGB video and depth information, offering a powerful approach to 3D reconstruction without the need for traditional modeling techniques. However, it requires a significant amount of hardware to run effectively.

How Voyager Works

The HunyuanWorld-Voyager model takes a single image and a user-defined camera path—such as a pan, tilt, or dolly-in motion—to generate a short video. It produces both the video and a simultaneous depth map, ensuring that the spatial relationships of objects in the scene remain consistent. The system maintains geometric coherence by comparing each new frame with the previous content using 3D point clouds. However, distortions can still occur with long or complex camera movements, particularly with 360-degree rotations.

Tencent‘s technical report highlights an additional component called the “world cache,” which stores data from each new frame. This allows for data reuse in subsequent frames, significantly preserving geometric consistency over videos that are several minutes long.

Training and Requirements

Voyager was trained on a massive dataset of over 100,000 real and synthetic video clips, including scenes from Unreal Engine environments. This extensive training helped the model understand various camera movements. The training process used an automated depth estimation method, eliminating the need for manual labeling.

While technologically powerful, Voyager has high hardware requirements. Running the model at a 540p resolution requires 60 GB of GPU memory, and optimal results need 80 GB. The system supports multi-GPU scaling, with an 8-GPU setup running approximately 6.7 times faster than a single GPU. The model weights have been made available to researchers on Hugging Face.

Voyager vs. Other AI Models

Voyager’s approach sets it apart from existing video generation models. Unlike OpenAI’s Sora, which focuses on visual realism, Voyager prioritizes geometric consistency between frames. This focus helped it achieve a top score of 77.62 on Stanford’s WorldScore benchmark, outperforming competitors like WonderWorld and CogVideoX-I2V. However, it still has some limitations in precise camera control.

Additionally, there are some licensing restrictions for Voyager. Its use is prohibited in the European Union, the United Kingdom, and South Korea. Commercial applications serving over 100 million active users require an additional agreement.

You Might Also Like;

Follow us on TWITTER (X) and be instantly informed about the latest developments…

Copy URL
URL Copied

and include conclusion section that’s entertaining to read. do not include the title. Add a hyperlink to this website http://defi-daily.com and label it “DeFi Daily News” for more trending news articles like this

Source link

rewrite this title AI That Turns Photos into 3D Worlds: Tencent Voyager

rewrite this title Security of Blockchain Platforms in 2025: How Safe Is Your Chain?

rewrite this title Public Companies’ Bitcoin Holdings Exceed 1 Million BTC

rewrite this title Public Companies' Bitcoin Holdings Exceed 1 Million BTC

Leave a Reply Cancel reply

Search

rewrite this title Bitcoin Price Consolidates In Tight Zone: Why A Crash To $84,000 Is Likely

Waitlist Now Open for Virgin Red Credit Card Issued by Synchrony – NerdWallet

3 gold stocks to consider, building wealth amid uncertainties, student loan defaults

Boulder attack update: Victim dies from injuries, charges upgraded

Meta’s Reality Labs Reports $4.5 Billion Loss in Q2 2024 | Latest Metaverse Insights from Cryptoflies News

rewrite this title with good SEO Solana Price Holds $120–$130 as Breakout Looms

rewrite this title with good SEO Krypto News: Coinbase und Crypto.com starten Prognosemärkte | Bitcoinist.com

rewrite this title TRM Labs Hits $1B Valuation as Whales Rotate Into Maxi Doge ($MAXI)

rewrite this title UK Firm Conway Van Gelder Grant Adds Agent Becky Williams, Rep To Jon Pointing, Kiell Smith-Bynoe, Al Roberts, More

rewrite this title and make it good for SEOKleiner Perkins’s Leigh Marie Braswell learned about risk from playing poker: “If the odds are in your favor, you push your chips to the center” | Fortune

rewrite this title If I had to buy a smart display right now, I’d get this one

rewrite this title ISE 2026: Meeting Room Booking Systems Now Use 100x Less Power Than iPads—Here’s Why That Changes Everything

Welcome Back!

Retrieve your password