Haven (Haiwen) Feng

Hi, I'm Haven! I'm a 4th year PhD student at Max Planck Institute for Intelligent Systems (MPI-IS), advised by Michael J. Black.
Since Autumn 2024, I have been visiting Berkeley AI Research, where I am advised by Angjoo Kanazawa. I also spent a wonderful summer with Marc Levoy's group at Adobe in 2023, hosted by Cecilia Zhang.

Before my PhD, I received M.Sc. and B.Sc. degrees in Physics at the University of Tübingen. For my master's, I researched disentangled representation learning with Wieland Brendel and Matthias Bethge at Tübingen AI Center. During my bachelor's, I worked on 3D face reconstruction with Timo Bolkart and Michael J. Black at MPI-IS.
I was born and raised in Beijing.

I will be on the job market in Spring 2025!

Email  /  Scholar  /  Twitter  /  Linkedin  /  Github

profile photo

Research

I'm interested in machine learning, computer vision, computer graphics, and particularly in how to understand and recreate our physical world in a principled and generalized way. My research centers on structural understanding and generation of visual data, with a focus on inverse graphics and controllable generation with foundation models.

Selected publications are listed below.

GenLit: Reformulating Single-Image Relighting as Video Generation
Shrisha Bharadwaj*, Haiwen Feng*, Victoria Fernandez Abrevaya, Michael J. Black
(*Equal contribution, listed alphabetically)
Preprint, 2025
project page / arXiv

Could we reformulate single-image relighting as a video generation task? By leveraging video diffusion models and a small synthetic dataset, we achieve realistic relighting effects with cast shadows on real images without explicit 3D. Our work reveals the potential of foundation models in understanding physical properties and performing graphics tasks.

InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman*, Haiwen Feng*ᐩ, Michael J. Black, Dimitrios Tzionas, Victoria Fernandez Abrevaya
(*Equal contribution ᐩProject lead)
Preprint, 2025
project page / arXiv

Can we generate physical interactions without physics simulation? We leverage video foundation models as implicit physics simulators. Given an initial frame and a control signal of a driving object, InterDyn generates plausible, temporally consistent videos of complex object interactions. Our work demonstrates the potential of using video generative models to understand and predict real-world physical dynamics.

SGP-Bench: Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu*, Weiyang Liu*, Haiwen Feng*, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf
(*Joint first author)
ICLR, 2025   (Spotlight Presentation)
project page / arXiv / code

We questioned whether LLMs can "imagine" how the corresponding graphics content would look without visually seeing it! This task requires both low-level skills (e.g., counting objects, identifying colors) and high-level reasoning (e.g., interpreting affordances, understanding semantics). Our benchmark effectively differentiates models by their reasoning abilities, with performance consistently aligning with the scaling law!

IG-LLM: Re-Thinking Inverse Graphics With Large Language Models
Peter Kulits*, Haiwen Feng*, Weiyang Liu, Victoria Abrevaya, Michael J. Black
(*Co-first author)
TMLR, 2024 - Selected to be presented at ICLR 2025
project page / arXiv

We explored if inverse graphics could be approached as a code generation task and found it generalize surprisingly well to OOD cases! However, is it optimal for graphics? Our research identifies a fundamental limitation of LLMs for parameter estimation and offers a simple but effective solution.

Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang
ECCV, 2024
project page / arXiv

We proposed "Time-Reversal Fusion" to enable the image-to-video model to generate towards a given end frame without any tuning. It not only provides a unified solution for three visual tasks but also probes the dynamic generation capability of the video diffusion model.

OFT: Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Zeju Qiu*, Weiyang Liu*, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Schoelkopf
NeurIPS, 2023
project page / arXiv / code

We proposed a principled PEFT method by orthogonally fine-tuning the pretrained model, resulting in superior alignment and faster convergence for controllable synthesis.

ArtEq: Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance
Haiwen Feng, Peter Kulits, Shichen Liu, Michael J. Black, Victoria Abrevaya
ICCV, 2023   (Oral Presentation)
project page / arXiv / code

We extended SE(3) Equivariance to articulated scenarios, achieving principled generalization for OOD body poses with 60% less error, and a network 1000 times faster and only 2.7% the size of the previous state-of-the-art model.

TRUST: Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation
Haiwen Feng, Timo Bolkart, Joachim Tesch, Michael J. Black, Victoria Abrevaya
ECCV, 2022
project page / arXiv / code

We conducted a systematic analysis of skin tone bias in 3D face albedo reconstruction and proposed the first unbiased albedo estimation evaluation suite (benchmark + metric). Additionally, we developed a principled method that reduces this bias by 80%.

DECA: Learning an Animatable Detailed 3D Face Model from In-The-Wild Images
Yao Feng*, Haiwen Feng*, Michael J. Black, Timo Bolkart
(*Equal contribution)
SIGGRAPH, 2021
project page / arXiv / code

We built the first animatable facial detail model that is purely learned from in-the-wild images and generalize to new expressions.


The template is stole from Jon Barron.