Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have always liked DBSCAN: https://en.wikipedia.org/wiki/DBSCAN

DBSCAN is used for clustering in applications, much like k-means clustering, but in k-means clustering, the number of clusters must be known in advance (hence the k)

DBSCAN solves this problem by for every point in the graph checking if a certain threshold of vertices is within a certain radius. If this is the case, it will add these vertices to the new cluster and repeat the process for these nodes. It is very simple, so much that you can easily explain this to non-technical people.

In my previous job I used it for detecting whether for one word in a signal there only existed one representation or multiple (used for toggle bit detection)



If you like DBSCAN, I would recommend checking out OPTICS. It is not a clustering algorithm per se, but generates information from which clusterings for multiple settings of DBSCAN parameters can be "queried" cheaply. The DBSCAN and OPTICS papers share an author. One my favorite algorithms. This is the original paper - [1], and this looks like a helpful presentation on the topic - [2]. Since you mention k-means, I would point out that unlike k-means which finds convex-shaped clusters only, DBSCAN/OPTICS identify clusters of any shape.

[1] http://www.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf

[2] https://www.cse.buffalo.edu/faculty/azhang/cse601/density-ba...


That definitely looks very interesting, I'll look into that.

Clustering on non linearly separable clusters is also one of the reasons we chose DBSCAN back then :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: