- Albert J. Zhai
- Yuan Shen
- Emily Y. Chen
- Gloria X. Wang
- Xinlei Wang
- Sheng Wang
- Kaiyu Guan
- Shenlong Wang
- University of Illinois at Urbana-Champaign
Abstract
Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object. We then construct a language-embedded point cloud and estimate the physical properties of each 3D point using a zero-shot kernel regression approach. Our method is accurate, annotation-free, and applicable to any object in the open world. Experiments demonstrate the effectiveness of the proposed approach in various physical property reasoning tasks, such as estimating the mass of common objects, as well as other properties like friction and hardness.
Predicting Physical Properties
We visualize input images of objects from the ABO dataset along with our model's CLIP feature PCA components, zero-shot material segmentation, and predicted mass density.
Our model makes reasonable predictions of materials across different parts of objects in 3D, allowing for grounded predictions of physical properties.
Our method can be used to predict physical properties in an open-vocabulary manner.
We show that it can be used to predict mass density, Young's modulus, thermal conductivity, hardness, and friction coefficients, all without supervision.
Physically Realistic Digital Twins
We show that realistic physical interactions can be simulated using mass-aware digital twins created by NeRF2Physics, enabling applications in immersive computing and simulation.
Here, the ball hits each object with the same initial momentum.