LivingWorld

Rendering Generated Worlds

LivingWorld generates dynamic 3D worlds from a single image and user-specified motion. Here, we present rendered results with moving cameras and environmental dynamics.

“A glacial river winds through a mountain valley.”

“A broad waterfall churns mist into the river below.”

“A fast alpine river winds through a pine forest beneath sharp peaks.”

“Low clouds drift across sunlit hills and valley roads.”

“Wildfire flames and heavy smoke roll across a mountain slope.”

Integration with Object-Centric Motion

While LivingWorld primarily models scene-scale environmental dynamics such as water, clouds, and smoke, it can also incorporate object-centric motion. Rather than representing all motion within a single field, we align object and environmental motions in world coordinates by enforcing consistent scale and spatial alignment. This allows independently modeled motions to be coherently integrated within the same scene, maintaining consistency across viewpoints and scene expansion.

Comparisons with Baseline Methods

We compare LivingWorld against video and 4D scene generation baselines. Video models often suffer from inconsistent 3D geometry, unnatural motion dynamics, and limited camera controllability, while 4D scene baselines are limited in scene scale, motion diversity, or runtime. LivingWorld maintains a coherent 3D representation with physically consistent dynamics and precise control over camera motion.

Quantitative Evaluation

We evaluate LivingWorld against video- and 4D-scene-generation baselines on 60 expanded scenes. VBench measures imaging quality, aesthetic, motion smoothness, and temporal flicker; PhysReal is a GPT-based assessment of physical realism; Runtime is per-scene generation time. LivingWorld attains the best or comparable scores across all dimensions while being orders of magnitude faster.

Category	Method	VBench (↑)				PhysReal (↑)	Runtime (↓)
Category	Method	Imaging	Aesthetic	Motion	Flicker	PhysReal (↑)	Runtime (↓)
Video	Veo 3.1	0.694	0.625	0.992	0.979	0.622	140
	CogVideoX	0.677	0.611	0.991	0.983	0.575	1510
	Tora	0.649	0.609	0.992	0.976	0.571	550
4D Scene	4DGS-Cinemagraphy	0.637	0.604	0.996	0.988	0.605	1980
	PerpetualWonder	0.553	0.553	0.979	0.972	0.554	3580
	LivingWorld	0.673	0.639	0.995	0.989	0.655	12

Bold: best underline: second best Runtime is in seconds.

Human Study (2AFC)

Motion

Flicker

Interactivity User Study

We conduct an interactivity user study to evaluate our GUI for interactive 4D world generation. Participants used our interface to author and explore dynamic scenes, and rated their experience on a 7-point Likert scale (1 = Strongly Disagree, 7 = Strongly Agree) across three dimensions: Usability, Controllability, and Usefulness. LivingWorld achieves consistently positive ratings across all dimensions, suggesting that the added 4D controls remain practically usable.

Citation

@article{mun2026livingworld,
                title={LivingWorld: Interactive 4D World Generation with Environmental Dynamics},
                author={Mun, Hyeongju and Jin, In-Hwan and Kim, Sohyeong and Kong, Kyeongbo},
                journal={European Conference on Computer Vision},
                year={2026}
                }

LivingWorld: Interactive 4D World Generation with Environmental Dynamics

Interactive Dynamic Scene Generation

Rendering Generated Worlds

Explore Generated Worlds Yourself

Integration with Object-Centric Motion

Comparisons with Baseline Methods

Quantitative Evaluation

Human Study (2AFC)

95-participant Study

CogVideoX

Tora

Veo 3.1

4DGS-Cinematic

PerpetualWonder

Interactivity User Study

Citation