Unsupervised Clustering using Pseudo-semi-supervised Learning

Divam Gupta; Ramachandran Ramjee; Nipun Kwatra; Muthian Sivathanu

Unsupervised Clustering using Pseudo-semi-supervised Learning

Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu

Keywords: clustering, ensembles, semi supervised learning, unsupervised

Abstract Paper Code Reviews Chat

Thurs Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Thurs Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: In this paper, we propose a framework that leverages semi-supervised models to improve unsupervised clustering performance. To leverage semi-supervised models, we first need to automatically generate labels, called pseudo-labels. We find that prior approaches for generating pseudo-labels hurt clustering performance because of their low accuracy. Instead, we use an ensemble of deep networks to construct a similarity graph, from which we extract high accuracy pseudo-labels. The approach of finding high quality pseudo-labels using ensembles and training the semi-supervised model is iterated, yielding continued improvement. We show that our approach outperforms state of the art clustering results for multiple image and text datasets. For example, we achieve 54.6% accuracy for CIFAR-10 and 43.9% for 20news, outperforming state of the art by 8-12% in absolute terms.

Unsupervised Clustering using Pseudo-semi-supervised Learning

Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu

Similar Papers

Self-labelling via simultaneous clustering and representation learning

Asano YM., Rupprecht C., Vedaldi A.,

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Junnan Li, Richard Socher, Steven C.H. Hoi,

Ensemble Distribution Distillation

Andrey Malinin, Bruno Mlodozeniec, Mark Gales,

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification

Yixiao Ge, Dapeng Chen, Hongsheng Li,