DDA

Abstract

Estimating 3D human pose and shape (HPS) from a monocular image has many applications. However, collecting ground-truth data for this problem is costly and constrained to limited lab environments. Researchers have used priors based on body structure or kinematics, cues obtained from other vision tasks such as optical flow and segmentation, and self-supervised tasks to mitigate the scarcity of supervision. Despite its apparent potential in this context, monocular depth estimation has yet to be explored. In this paper, we propose the Dense Depth Alignment (DDA) method, where we use an estimated dense depth map to create an auxiliary supervision signal for 3D HPS estimation. Specifically, we define a dense mapping between the points on the surface of the human mesh and the points reconstructed from depth estimation. We further introduce the idea of Camera Pretraining, a novel learning strategy where, instead of estimating all parameters simultaneously, learning of camera parameters is prioritized (before pose and shape parameters) to avoid unwanted local minima. Our experiments on Human3.6M and 3DPW datasets show that our DDA loss and Camera Pretraining significantly improve HPS estimation performance over using only 2D keypoint supervision or 2D and 3D supervision. Code will be provided for research purposes.

Camera Pretraining

Training objectives that minimize point-to-point distances get stuck at local minima (Figure 2). This is caused by the non-linear nature of the Forward Kinematics and losses, such as 2D re-projection and 3D joint losses (and our DDA) put the burden of solving the inverse problem on the model itself. We identify that prioritizing the optimization of camera pose, which is the root of the kinematic chain, greatly alleviates the problem. For this, we simply pre-train our models to learn only the camera parameters with fixed body pose and shape.

BibTeX

@article{karagoz2023dda,
  author    = {Karagoz, Batuhan and Suat, Ozhan and Uguz, Bedirhan and Akbas, Emre},
  title     = {Dense Depth Alignment for Human Pose and Shape Estimation},
  journal   = {----},
  year      = {----,
}

Dense Depth Alignment for Human Pose and Shape Estimation

Abstract

Camera Pretraining

Quantitative Results

Qualitative Results

BibTeX