Haven (Haiwen) Feng

Hi, I'm Haven! I'm a 3rd year PhD student at Max Planck Institute for Intelligent Systems (MPI-IS), advised by Michael J. Black.
Starting in Autumn 2024, I will be visiting Berkeley AI Research , where I will be advised by Angjoo Kanazawa. I've also spent a wonderful summer with Marc Levoy's group at Adobe in 2023, hosted by Cecilia Zhang.

Before my PhD, I received M.Sc. and B.Sc. degrees in Physics at the University of Tübingen. For my master's, I researched disentangled representation learning with Wieland Brendel and Matthias Bethge at Tübingen AI Center. During my bachelor's, I worked on 3D face reconstruction with Timo Bolkart and Michael J. Black at MPI-IS.
I was born and raised in Beijing.

Email  /  Scholar  /  Twitter  /  Linkedin  /  Github

profile photo

Research

I'm interested in machine learning, computer vision, computer graphics, and particularly in how to understand and recreate our physical world in a principled and generalized way. My research centers on visual inverse problems and controllable generation with foundation models (various kinds of them). Selected publications are listed below (* indicates equal contribution).

SGP-Bench: Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu*, Weiyang Liu*, Haiwen Feng*, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf
Preprint, 2024
project page / arXiv / code

We questioned whether LLMs can "imagine" how the corresponding graphics content would look without visually seeing it! This task requires both low-level skills (e.g., counting objects, identifying colors) and high-level reasoning (e.g., interpreting affordances, understanding semantics). Our benchmark effectively differentiates models by their reasoning abilities, with performance consistently aligning with the scaling law!

IG-LLM: Re-Thinking Inverse Graphics With Large Language Models
Peter Kulits*, Haiwen Feng*, Weiyang Liu, Victoria Abrevaya, Michael J. Black
TMLR, 2024
project page / arXiv

We explored if inverse graphics could be approached as a code generation task and found it generalize surprisingly well to OOD cases! However, is it optimal for graphics? Our research identifies a fundamental limitation of LLMs for parameter estimation and offers a simple but effective solution.

Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang
ECCV, 2024
project page / arXiv

We proposed "Time-Reversal Fusion" to enable the image-to-video model to generate towards a given end frame without any tuning. It not only provides a unified solution for three visual tasks but also probes the dynamic generation capability of the video diffusion model.

OFT: Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Zeju Qiu*, Weiyang Liu*, Haiwen Feng, Yuxuan Xue, Yao Feng, Zhen Liu, Dan Zhang, Adrian Weller, Bernhard Schoelkopf
NeurIPS, 2023
project page / arXiv / code

We proposed a principled PEFT method by orthogonally fine-tuning the pretrained model, resulting in superior alignment and faster convergence for controllable synthesis.

ArtEq: Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance
Haiwen Feng, Peter Kulits, Shichen Liu, Michael J. Black, Victoria Abrevaya
ICCV, 2023   (Oral Presentation)
project page / arXiv / code

We extended SE(3) Equivariance to articulated scenarios, achieving principled generalization for OOD body poses with 60% less error, and a network 1000 times faster and only 2.7% the size of the previous state-of-the-art model.

TRUST: Towards Racially Unbiased Skin Tone Estimation via Scene Disambiguation
Haiwen Feng, Timo Bolkart, Joachim Tesch, Michael J. Black, Victoria Abrevaya
ECCV, 2022
project page / arXiv / code

We conducted a systematic analysis of skin tone bias in 3D face albedo reconstruction and proposed the first unbiased albedo estimation evaluation suite (benchmark + metric). Additionally, we developed a principled method that reduces this bias by 80%.

DECA: Learning an Animatable Detailed 3D Face Model from In-The-Wild Images
Yao Feng*, Haiwen Feng*, Michael J. Black, Timo Bolkart
SIGGRAPH, 2021
project page / arXiv / code

We built the first animatable facial detail model that is purely learned from in-the-wild images and generalize to new expressions.


The template is stole from Jon Barron.