LivingWorld: Interactive 4D World Generation with Environmental Dynamics

Pusan National University
*Equal Contribution Corresponding Author
ECCV 2026 in Malmö, Sweden

Interactive Dynamic Scene Generation

Rendering Generated Worlds

LivingWorld generates dynamic 3D worlds from a single image and user-specified motion. Here, we present rendered results with moving cameras and environmental dynamics.

“A glacial river winds through a mountain valley.”
“A broad waterfall churns mist into the river below.”
“A fast alpine river winds through a pine forest beneath sharp peaks.”
“Low clouds drift across sunlit hills and valley roads.”
“Wildfire flames and heavy smoke roll across a mountain slope.”

Explore Generated Worlds Yourself

Load our saved Gaussian worlds directly in the browser and explore them in real time, with rendering performed on the viewer's device.

W/A/S/D move through the Gaussian world. I/J/K/L look around the generated scene.

Integration with Object-Centric Motion

While LivingWorld primarily models scene-scale environmental dynamics such as water, clouds, and smoke, it can also incorporate object-centric motion. Rather than representing all motion within a single field, we align object and environmental motions in world coordinates by enforcing consistent scale and spatial alignment. This allows independently modeled motions to be coherently integrated within the same scene, maintaining consistency across viewpoints and scene expansion.

Comparisons with Baseline Methods

We compare LivingWorld against video and 4D scene generation baselines. Video models often suffer from inconsistent 3D geometry, unnatural motion dynamics, and limited camera controllability, while 4D scene baselines are limited in scene scale, motion diversity, or runtime. LivingWorld maintains a coherent 3D representation with physically consistent dynamics and precise control over camera motion.

Quantitative Evaluation

We evaluate LivingWorld against video- and 4D-scene-generation baselines on 60 expanded scenes. VBench measures imaging quality, aesthetic, motion smoothness, and temporal flicker; PhysReal is a GPT-based assessment of physical realism; Runtime is per-scene generation time. LivingWorld attains the best or comparable scores across all dimensions while being orders of magnitude faster.

Category Method VBench (↑) PhysReal (↑) Runtime (↓)
Imaging Aesthetic Motion Flicker
Video Veo 3.1 0.694 0.625 0.992 0.979 0.622 140
CogVideoX 0.677 0.611 0.991 0.983 0.575 1510
Tora 0.649 0.609 0.992 0.976 0.571 550
4D Scene 4DGS-Cinemagraphy 0.637 0.604 0.996 0.988 0.605 1980
PerpetualWonder 0.553 0.553 0.979 0.972 0.554 3580
LivingWorld 0.673 0.639 0.995 0.989 0.655 12

Bold: best   underline: second best   Runtime is in seconds.

Human Study (2AFC)

Values indicate the percentage of participants who preferred LivingWorld over each baseline. Outer arcs show 95% Wilson confidence intervals over participants.

95-participant Study

CogVideoX

68% ±9.0
Imaging
73% ±8.6
Aesthetic
66% ±9.1
Motion
78% ±8.0
Flicker

Tora

77% ±8.2
Imaging
79% ±7.9
Aesthetic
76% ±8.3
Motion
84% ±7.2
Flicker

Veo 3.1

58% ±9.5
Imaging
66% ±9.1
Aesthetic
72% ±8.7
Motion
75% ±8.4
Flicker

4DGS-Cinematic

73% ±8.6
Imaging
75% ±8.4
Aesthetic
68% ±9.0
Motion
74% ±8.5
Flicker

PerpetualWonder

82% ±7.5
Imaging
87% ±6.6
Aesthetic
85% ±7.0
Motion
92% ±5.4
Flicker



Interactivity User Study

We conduct an interactivity user study to evaluate our GUI for interactive 4D world generation. Participants used our interface to author and explore dynamic scenes, and rated their experience on a 7-point Likert scale (1 = Strongly Disagree, 7 = Strongly Agree) across three dimensions: Usability, Controllability, and Usefulness. LivingWorld achieves consistently positive ratings across all dimensions, suggesting that the added 4D controls remain practically usable.

1 3 5 7 Usability 5.65 (±1.25) Controllability 5.90 (±1.42) Usefulness 6.00 (±1.21) Avg. 5.85 (± 1.30)

Citation

@article{mun2026livingworld,
                title={LivingWorld: Interactive 4D World Generation with Environmental Dynamics},
                author={Mun, Hyeongju and Jin, In-Hwan and Kim, Sohyeong and Kong, Kyeongbo},
                journal={European Conference on Computer Vision},
                year={2026}
                }