Unsupervised Clustering using Pseudo-semi-supervised Learning

Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu

Keywords: clustering, ensembles, semi supervised learning, unsupervised

Thurs Session 4 (17:00-19:00 GMT) [Live QA] [Cal]
Thurs Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: In this paper, we propose a framework that leverages semi-supervised models to improve unsupervised clustering performance. To leverage semi-supervised models, we first need to automatically generate labels, called pseudo-labels. We find that prior approaches for generating pseudo-labels hurt clustering performance because of their low accuracy. Instead, we use an ensemble of deep networks to construct a similarity graph, from which we extract high accuracy pseudo-labels. The approach of finding high quality pseudo-labels using ensembles and training the semi-supervised model is iterated, yielding continued improvement. We show that our approach outperforms state of the art clustering results for multiple image and text datasets. For example, we achieve 54.6% accuracy for CIFAR-10 and 43.9% for 20news, outperforming state of the art by 8-12% in absolute terms.

Similar Papers

Ensemble Distribution Distillation
Andrey Malinin, Bruno Mlodozeniec, Mark Gales,