Date - Heure / Date - Hour
Date(s) - 17/04/2018
10h30 - 12h30
Emplacement / Location
ENAC, Building Breguet, Amphi Breguet
Ballstering : a clustering algorithm for large datasets.
Ballstering belongs to the machine learning methods that aim to group in classes a set of objects that form the studied dataset, without any knowledge of true classes within it. This type of methods, of which k-means is one of the most famous representative, are named clustering methods. Recently, a new clustering algorithm “Fast Density Peak Clustering” (FDPC) has aroused great interest from the scientiﬁc community for its innovating aspect and its eﬃciency on non-concentric distributions. However this algorithm showed a such complexity that it can’t be applied with ease on large datasets. Moreover, we have identiﬁed several weaknesses that impact the quality results and the presence of a general parameter dc, diﬃcult to choose while having a signiﬁcant impact on the results. In view of those limitations, we reworked the principal idea of FDPC in a new light and modiﬁed it successively to ﬁnally create a distinct algorithm that we called Ballstering.
The work carried out during those three years can be summarised by the conception of this clustering algorithm especially designed to be eﬀective on large datasets. As its Precursor, Ballstering works in two phases: An estimation density phase followed by a clustering step. Its conception is mainly based on a procedure that handle the ﬁrst step with a lower complexity while avoiding at the same time the diﬃcult choice of dc , which becomes automatically deﬁned according to local density. We name ICMDW this procedure which represent a consistent part of our contributions. We also overhauled cores deﬁnitions of FDPC and entirely reworked the second phase (relying on the graph structure of ICMDW’s intermediate results), to ﬁnally produce an algorithm that overcome all the limitations that we have identiﬁed.
Vincent Courjault-Radé, ENAC, DEVI, Toulouse, France.