Date - Heure / Date - Hour
Date(s) - 17/04/2018
10h30 - 12h30
Emplacement / Location
ENAC, Building Breguet, Amphi Breguet
Ballstering : a clustering algorithm for large datasets.
Ballstering belongs to the machine learning methods that aim to group in classes a set of objects that form the studied dataset, without any knowledge of true classes within it. This type of methods, of which k-means is one of the most famous representative, are named clustering methods. Recently, a new clustering algorithm “Fast Density Peak Clustering” (FDPC) has aroused great interest from the scientific community for its innovating aspect and its efficiency on non-concentric distributions. However this algorithm showed a such complexity that it can’t be applied with ease on large datasets. Moreover, we have identified several weaknesses that impact the quality results and the presence of a general parameter dc, difficult to choose while having a significant impact on the results. In view of those limitations, we reworked the principal idea of FDPC in a new light and modified it successively to finally create a distinct algorithm that we called Ballstering.
The work carried out during those three years can be summarised by the conception of this clustering algorithm especially designed to be effective on large datasets. As its Precursor, Ballstering works in two phases: An estimation density phase followed by a clustering step. Its conception is mainly based on a procedure that handle the first step with a lower complexity while avoiding at the same time the difficult choice of dc , which becomes automatically defined according to local density. We name ICMDW this procedure which represent a consistent part of our contributions. We also overhauled cores definitions of FDPC and entirely reworked the second phase (relying on the graph structure of ICMDW’s intermediate results), to finally produce an algorithm that overcome all the limitations that we have identified.
Vincent Courjault-Radé, ENAC, DEVI, Toulouse, France.