1Pusan National University
*Equal Contribution †Corresponding Author
Overview of LivingWorld. LivingWorld generates dynamic 4D scenes from a single image by progressively expanding the scene while maintaining globally coherent environmental dynamics. We demonstrate results with scene-scale motions such as water, clouds, and smoke, rendered under moving viewpoints. Our method preserves spatial and temporal consistency across views, enabling interactive and physically plausible 4D world generation.
We introduce LivingWorld, an interactive framework for generating 4D worlds with environmental dynamics from a single image. While recent advances in 3D scene generation enable large-scale environment creation, most approaches focus primarily on reconstructing static geometry, leaving scene-scale environmental dynamics such as clouds, water, or smoke largely unexplored. Modeling such dynamics is challenging because motion must remain coherent across an expanding scene while supporting low-latency user feedback. LivingWorld addresses this challenge by progressively constructing a globally coherent motion field as the scene expands. To maintain global consistency during expansion, we introduce a geometry-aware alignment module that resolves directional and scale ambiguities across views. We further represent motion using a compact hash-based motion field, enabling efficient querying and stable propagation of dynamics throughout the scene. This representation also supports bidirectional motion propagation during rendering, producing long and temporally coherent 4D sequences without relying on expensive video-based refinement. On a single RTX 5090 GPU, generating each new scene expansion step requires 9 seconds, followed by 3 seconds for motion alignment and motion field updates, enabling interactive 4D world generation with globally coherent environmental dynamics. Additional dynamic results are provided in the supplementary video.
LivingWorld constructs a globally consistent motion field during scene expansion. Starting from a single image, we progressively expand the scene and estimate motion from user interactions. To ensure consistency across views, we introduce a geometry-aware alignment module that resolves directional and scale ambiguities. We further represent motion using a compact hash-based field, enabling efficient and stable propagation of dynamics. This allows temporally coherent and view-consistent 4D world generation.
Our alignment module resolves directional and scale ambiguities across views. By matching overlapping regions between newly expanded and existing scenes, we enforce globally consistent motion during scene growth. This enables stable integration of motion across the entire scene.
We represent motion as a continuous hash-based field, enabling efficient querying and stable propagation of dynamics. The field supports bidirectional motion integration, producing long and temporally coherent 4D sequences during rendering.
While LivingWorld primarily models scene-scale environmental dynamics such as water, clouds, and smoke, it can also incorporate object-centric motion. Rather than representing all motion within a single field, we align object and environmental motions in world coordinates by enforcing consistent scale and spatial alignment. This allows independently modeled motions to be coherently integrated within the same scene, maintaining consistency across viewpoints and scene expansion.
Values indicate the percentage of participants who preferred LivingWorld over each baseline.