We present FaceLift, a novel feed-forward approach for generalizable, high-quality 360-degree head reconstruction from a single image. Our pipeline first employs a multi-view latent diffusion model to generate consistent side and back views from a single facial input, which then feed into a transformer-based reconstructor that produces a comprehensive 3D Gaussian Splats representation. Previous methods for monocular 3D face reconstruction often lack full view coverage or view consistency due to insufficient multi-view supervision. We address this by creating a high-quality synthetic head dataset that enables consistent supervision across viewpoints. To bridge the domain gap between synthetic training data and real-world images, we propose a simple yet effective technique that ensures the view-generation process maintains fidelity to the input by learning to reconstruct the input image alongside view generation. Despite being trained exclusively on synthetic data, our method demonstrates remarkable generalization to real-world images. Through extensive qualitative and quantitative evaluations, we show that FaceLift outperforms state-of-the-art 3D face reconstruction methods in identity preservation, detail recovery, and rendering quality.
Overview of FaceLift. Given a single image of a human face as input, we train an image-conditioned, multi-view diffusion model to generate novel views covering the entire head. By reconstructing the input image and leveraging high-quality synthetic data, our multi-view latent diffusion model can hallucinate unseen views of the human head with high-fidelity and multi-view consistency. We then train a transformer-based reconstructor, which takes multi-view images and their camera poses as input and generates 3D Gaussian Splats to represent the human head.
FaceLift lifts a single facial image to a detailed 3D reconstruction with preserved identity features.
Given a video as input, FaceLift processes each frame individually and generates 3D Gaussian sequence, which enables 4D novel view synthesis.
FaceLift can be combined with 2D face animation methods like LivePortrait to achieve 3D face animation.
@misc{lyu2024facelift,
title={FaceLift: Single Image to 3D Head with View Generation and GS-LRM},
author={Weijie Lyu and Yi Zhou and Ming-Hsuan Yang and Zhixin Shu},
year={2024},
eprint={2412.17812},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.17812}
}