Performance Analysis of Rule Based Automatic SNN Algorithm on Big Data Sets

Cavus A., KARABİNA A., Kılıç E.

26th IEEE Signal Processing and Communications Applications Conference (SIU), İzmir, Turkey, 2 - 05 May 2018 identifier identifier

  • Keywords: clustering, density based algorithm, automatic SNN algorithm
Clustering is defined as the classification of patterns into groups (clusters) without supervision. The clustering of similarities of data is a complex process that can not be done with human hands. There are various clustering algorithms based on different principles in the literature. The SNN (Shared Nearest Neighborhood) algorithm is a density-based clustering algorithm that identifies similarities between the data by looking at the shared nearest neighbors by two data. The SNN algorithm uses parameters specifying the radius (Eps) that a user enters when clustering, a radius that limits a neighborhood of a point, and the minimum number of points (minPorts) that must be in an eps-neighborhood. This leads to clustering performans has dependency of user experience. A rule-based automatic SNN algorithm has been proposed to remove this dependency from the user. In this study, the performance of the rule-based automatic SNN algorithm over the data sets with 2000 and over sample numbers is examined and presented.