Structural Priors for Vision

ICCV 2025 Workshop, Honolulu, Hawai'i
October 19-23 2025, TBD

In recent years, there has been a growing trend toward training data-centric, large-scale foundation models that reduce reliance on structural priors. However, is simply scaling up Transformers truly the ultimate solution for computer vision? In this workshop, we aim to reintroduce structural priors and explore how they can further push the boundaries of foundation models.

Our workshop provides an interdisciplinary space for sharing ideas across domains. For example, scene-aware 2D perception can enhance 3D modeling and robotic manipulation, while geometric reasoning can enhance the visual grounding of 2D perception and multimodal models. Through these interactions, we aim to better define the role of priors in vision foundation models.

Our topics include but are not limited to:

  • Scene-aware vision models for images and videos.
  • Geometry and equivariance for 3D vision.
  • Temporal and motion priors for videos.
  • Behavioral priors for robotics and egocentric views.
  • Physics priors for world models and interactions.

Keynote Speakers
Danfei Xu
Georgia Tech & NVIDIA
João Carreira
Google DeepMind
Jiajun Wu
Stanford
Kristen Grauman
UT Austin
Saining Xie
NYU
Vincent Sitzmann
MIT
Schedule
Opening Remarks and Welcome 08:50-09:00
Keynote Talk: Speaker TBD
Title TBD
09:00-09:40
Keynote Talk: Speaker TBD
Title TBD
09:40-10:20
Coffee Break 10:20-10:40
Keynote Talk: Speaker TBD
Title TBD
10:40-11:20
Spotlight Talk
Title TBD
11:20-11:35
Spotlight Talk
Title TBD
11:35-11:50
Lunch Break 11:50-12:30
Accepted Paper Poster Session 12:30-13:30
Keynote Talk: Speaker TBD
Title TBD
13:30-14:10
Keynote Talk: Speaker TBD
Title TBD
14:10-14:50
Coffee Break 14:50-15:10
Keynote Talk: Speaker TBD
Title TBD
15:10-15:50
Spotlight Talk
Title TBD
15:50-16:05
Spotlight Talk
Title TBD
16:05-16:20
Closing Remarks 16:20-16:30
Accepted Paper Poster Session 16:30-17:30
Accepted Papers
Ground-Displacement Forecasting from Satellite Image Time Series via a Koopman-Prior Autoencoder
Takayuki Shinohara
Spatial Mental Modeling from Limited Views
Baiqiao Yin, Qineng Wang, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei
SEAL-Pose: Enhancing 3D Human Pose Estimation through Trainable Loss Function
Junggeun Do, Jay-Yoon Lee
StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation
Haodong Li, Chen Wang, Jiahui Lei, Kostas Daniilidis, Lingjie Liu
VidMP3: Video Editing by Representing Motion with Pose and Position Priors
Sandeep Mishra, Oindrila Saha, Alan Bovik
The Diashow Paradox: Stronger 3D-Aware Representations Emerge from Image Sets, Not Videos
Nguyen Tien Duc, Anna Sonnweber, Mark Weber, Nikita Araslanov, Daniel Cremers
Identity-Motion Trade-offs in Text-to-Video via Query-Guided Attention Priors
Yuval Atzmon, Rinon Gal, Yoad Tewel, Yoni Kasten, Gal Chechik
Axis-level Symmetry Detection with Group-Equivariant Representation
Wongyun Yu, Ahyun Seo, Minsu Cho
Combinative Matching for Geometric Shape Assembly
Nahyuk Lee, Juhong Min, Junhong Lee, Chunghyun Park, Minsu Cho
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee, Jihyeon Je, Chanho Park, Mikaela Angelina Uy, Leonidas Guibas, Minhyuk Sung
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang, Dayoung Gong, Manjin Kim, Minsu Cho
SHED Light on Segmentation for Depth Estimation
Seung Hyun Lee, Sangwoo Mo, Stella X. Yu
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo, Dahyun Kang, Sanghyun Kim, Yunseon Choi, Minsu Cho
MultiViewPano: A Generalist Approach to 360-degree Panorama Generation
Simon Coessens, Akash Malhotra, Nacera Seghouani
LACONIC: A 3D Layout Adapter for Controllable Image Creation
Léopold Maillard, Tom Durand, Adrien Ramanana Rahary, Maks Ovsjanikov
SuperDec: 3D Scene Decomposition with Superquadric Primitives
Elisabetta Fedele, Boyang Sun, Leonidas Guibas, Marc Pollefeys, Francis Engelmann
Injecting Geometric Scene Priors into Vision Transformers for Improved 2D-3D Understanding
Laura Tran-Dubois
Organizers
Sangwoo Mo
UMich
Congyue Deng
Stanford
Hila Chefer
Black Forest Labs
Daniel Zoran
Google DeepMind
Kaichun Mo
NVIDIA
Leonidas Guibas
Stanford
Stella Yu
UMich
Top