Date - Heure / Date - Hour
Date(s) - 14/12/2017
11h00 - 12h00
Emplacement / Location
ENAC, Building Caudron, Room C016
Ballstering: Density peak clustering for large datasets
Ballstering belongs to the machine learning methods that aim to group in classes a set of objects that form the studied dataset, without any knowledge of true classes within it. This type of methods, of which k-means is one of the most famous representative, are named clustering methods. Recently, a new clustering algorithm “Fast Density Peak Clustering” (FDPC) has aroused great interest from the scientific community for its innovating aspect and its efficiency on non-concentric distributions.
However this algorithm showed a such complexity that it can’t be applied with ease on large datasets. Moreover, we have identified several weaknesses that impact the quality results and the presence of a general parameter dc, difficult to choose while having a significant impact on the results.
In view of those limitations, we reworked the principal idea of FDPC in a new light and modified it successively to finally create a distinct algorithm that we called Ballstering. As its Precursor, Ballstering works in two phases: An estimation density phase followed by a clustering step. Its conception is mainly based on ICMDW, a procedure that handle the first step with a lower complexity while avoiding at the same time the difficult choice of dc, which becomes automatically defined according to local density. We also overhauled cores definitions of FDPC and entirely reworked the second phase (relying on the graph structure of ICMDW’s intermediate results), to finally produce an algorithm that overcome all the limitations that we have identified.
Vincent Courjault-Radé, DEVI, ENAC, Toulouse.