13. Clustering

作者: 玄语梨落 | 来源:发表于2020-08-21 08:10 被阅读0次

Clustering

Unsupervised learing introduction

K-means algorithm

Input:

  • K (number of clusters)
  • Training set \{x^{(1)},x^{(2)},...,x^{(n)}\}\quad x^{(i)}\in R^{n}

K-means algroithm

Randomly initialize K cluster centroids \mu_1,\mu_2,...,\mu_K \in R^{n}

Repeat {
for i = 1 to m
c^{(i)}:=index (from 1 to K) of cluster centroid closest to x^{(i)}
for k=1 to K
\mu_K:=average (mean) of points assigned to cluster k
}

K-means for non-separated clusters

Optimization Objective

K-means optimization objective

c^{(i)} = index of cluster (1,2,...,K) to which example x^{(i)} is currently assigned.
\mu_k = cluster centriod k (\mu_k\in R^{n})
\mu_c^{(i)} = cluster centroid of cluster to which example x^{(i)} has been assigned

Distortion:
J(c^{(i)},...,c^{(m)},\mu_1,...,\mu_k) = \frac{1}{m}\sum_{i=1}^m||x^{(i)}=\mu_c^{(i)}||^2

Random Initalization

  • K < m
  • Randomly pick K trianing examples.
  • Set \mu_1,...,\mu_k equal to these K examples.

Random initialization many times:

For i=1 to 100 {
Randomly initialize K-means.
Run K-means. Get c^{(1)},...,c^{(m)},\mu_1,...,\mu_K
Compute cost function (distortion)
J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_K)
}

Pick clustering that gave lowest cost J.

K=2-10, if K is very large, Randomly initilization many times may help less.

Choosing the number of clusters

  • by hand (see the data set's figure)
  • Elbow method: compute cost J form 1-..., and draw a figure of it. choose the number at the elbow.
  • Evaluate K-means based on a metric for how well it performs for that later purpose. Choose a K that serve the purpose.

相关文章

  • 13. Clustering

    Clustering Unsupervised learing introduction K-means algo...

  • 2019-01-12[Stay Sharp]hierarchic

    what is hierarchical clustering ? hierarchical clustering...

  • ML - hw4

    1. Spectral Clustering (a) Spectral Clustering on synthes...

  • 100天持续行动—Day13

    10.26看了K-Means clustering, Hierarchical clustering 和 DBSC...

  • 2019-01-10[Stay Sharp]k-means cl

    what is k-means clustering? K-means clustering is a metho...

  • 漫谈 Clustering

    漫谈 Clustering (1): k-means 漫谈 Clustering (2): k-medoids 漫...

  • Clustering

    We perform clustering because we believe the underlying c...

  • Clustering

    本文结构安排 经典聚类算法:线性聚类 Kmeans 经典聚类算法:非线性聚类 DBSCAN、谱聚类 新兴聚类算法:...

  • Clustering

    Single linkage clustering 类似层次聚类的概念,时间复杂度是O(n^3)。 SLC聚类效果...

  • Clustering

    Clustering 算法概念 算法类型 K-means算法是非监督学习聚类(clustering)中的经典算法,...

网友评论

    本文标题:13. Clustering

    本文链接:https://www.haomeiwen.com/subject/zoxgdktx.html