Automatically Discovering Local Visual Material Attributes
Gabriel Schwartz and Ko Nishino
Drexel University
Shape cues play an important role in computer vision, but shape is not the only information available in images. Materials, such as fabric and plastic, are discernible in images even when shapes, such as those of an object, are not. We argue that it would be ideal to recognize materials without relying on object cues such as shape. This would allow us to use materials as a context for other vision tasks, such as object recognition. Humans are intuitively able to find visual cues that describe materials. Previous frameworks attempt to recognize these cues (as visual material traits) using fully-supervised learning. This requirement is not feasible when multiple annotators and large quantities of images are involved. In this paper, we derive a framework that allows us to discover locally-recognizable material attributes from crowdsourced perceptual material distances. We show that the attributes we discover do in fact separate material categories. Our learned attributes exhibit the same desirable properties as material traits, despite the fact that they are discovered using only partial supervision.
Automatically Discovering Local Visual Material Attributes
G. Schwartz and K. Nishino,
in Proc. of IEEE Conference on Computer Vision and Pattern Recognition CVPR’15, Jun., 2015.
[ paper ][ project ]
Overview
Our goal is to discover a set of attributes that exhibit the desirable properties of material traits. We want to achieve this without relying on fully-supervised learning. Known material traits, such as “smooth” or “rough,” represent visual properties shared between similar materials. We use crowdsourcing (Amazon Mechanical Turk) to determine the visual similarity of material categories as seen by humans. For this, we show image patches of different materials as references and ask whether other image patches from other materials look similar. These results are aggregated to compute pairwise visual distances between material categories.
We then identify a space of material attributes that preserves these pairwise distances while permitting reliable recognition of the attributes on local image patches. For this, we first convert the distance matrix into a category-attribute matrix that realizes desirable characteristics such as sparsity. This can be formulated as an optimization for estimating the material attribute–category probability matrix.
We then train a joint attribute classifier that predicts, on average for each category, the desired attribute likelihoods. Our entire formulation requires no supervised labeling of attributes on training data. Material categories are separated much more clearly in this attribute space using the discovered material attributes compared with the raw image feature space, suggesting better recognition of categories from the discovered attributes.
Results
The discovered visual material attributes not only aid in local material category recognition but also capture semantic material properties. Per-pixel discovered attribute probabilities for four attributes (one per column). These images show that the discovered attributes exhibit patterns similar to those of known material traits. The first attribute, for example, appears consistently within the woven hat and the koala; the second attribute tends to indicate smooth regions. The last two columns show we are discovering attributes that can appear both sparsely and densely in an image, depending on the context. These are all properties shared with visual material traits.
Correlation between discovered attribute predictions and material traits. Groups of attributes can collectively indicate the presence of a material trait. Metallic, for example, correlates positively with attribute 0 and negatively with attribute 8.