[기계 학습] [계층 집합 알고리즘 - 2] 계층 집합 알고리즘 (Hierarchical Clustering Alg) 의 Python 구현

6387 단어 기계 학습 인공지능

계층 집합 알고리즘 이 간단 하 다 고 보지 마라. 그러나 실현 하기 위해 서 는 데이터 구조 에 있어 한 번 더 생각해 야 한다. 데이터 구 조 를 쉽게 확정 하고 실현 한 사람 이 알 아야 할 것 이 아니다.
python 코드 는 스스로 실현 되 며, 실행 결과 와 수학 계산 결과 가 완전히 일치 하 므 로 안심 하고 read - code 를 사용 할 수 있 습 니 다.
완전 인 육 제품 입 니 다. 코드 는 다음 과 같 습 니 다.
1. 계층 집합 알고리즘 의 계산 과정 원리
이전 글 참조:https://blog.csdn.net/u012421852/article/details/80531184
2.Code

# -*- coding: utf-8 -*-
"""
@author:      Tom
Talk is cheap, show me the code
Aim:         Hierarchical Clustering Alg
"""

import numpy as np
    
class CHC(object):
    '''
    Hierarchical Clustering Algorithm，HCA，      
    '''
    def __init__(self, samples):
        self.samples = samples
        self.hc = []
        self.cnt = 0
        self._work()

    def indmin_matrix(self, M):
        '''    M          '''
        row,col = divmod(np.argmin(M), np.shape(M)[1])
        return row,col
    
    def em(self, A, B):
        '''
          A B       
          A=[1,1], B=[4,5]     =5.0
        '''
        efunc = lambda a,b : np.power(float(a)-float(b),2)
        func = np.frompyfunc(efunc, 2, 1)
        em = np.sqrt(sum(func(A,B)))
        return em
    
    def AverageLinkage(self, A, B):
        '''                           
                             
        a = [1,1],b = [1,1]
        c = [4,5],d = [6,7]
        A=[a,b] B=[c,d]
         AverageLinkage(A,B)=(em(a,c)+em(a,d)+em(b,c)+em(b,d))/4
        '''
        total = 0.0
        for i in A:
            for j in B:
                total += self.em(i,j)
        ret = total / (np.shape(A)[0]*np.shape(B)[0])
        return ret
        
    def _work(self):
        self.cnt += 1
        print('

=====================%d times Hierarical Clustring=================='%self.cnt)
        #      ，        
        if 0 == np.shape(self.hc)[0]:
            initData = [[i] for i in range(np.shape(self.samples)[0])]
            self.hc = [initData]
            print('init self.hc:', self.hc)
        preHC, n = self.hc[-1], np.shape(self.hc[-1])[0]
        print('preHC:',preHC)
        #   2    ，        
        if 2 == n: 
            print('succeed hierarical clustring:
',)
            for i in range(np.shape(self.hc)[0]):
                print(self.hc[i])
            return self.hc
        #          
        dist = np.full(shape=(n,n), fill_value=np.inf)
        value = np.array(self.samples)[:,-1]
        for i in range(n):
            for j in np.arange(start=i+1, stop=n, step=1):
                A,B = value[preHC[i]], value[preHC[j]]
                dist[i,j] = self.AverageLinkage(A,B)
        print('dist:
', dist)
        #        
        row,col = self.indmin_matrix(dist)
        C = []
        newHC = []
        for i in range(n):
            if row == i or col == i:
                if np.shape(C)[0] == 0:
                    C = preHC[row] + preHC[col]
                    newHC.append(C)
                continue
            newHC.append(preHC[i])
        #  HC    
        self.hc.append(newHC)
        for i in range(np.shape(self.hc)[0]):
            print(self.hc[i])
        return self._work()        
    
if __name__=='__main__':   
    srcData = [[['A'], [16.9]],
                [['B'], [38.5]],
                [['C'], [39.5]],
                [['D'], [80.8]],
                [['E'], [82]],
                [['F'], [34.6]],
                [['G'], [116.1]]]
    hc = CHC(srcData)

3. 실행 결과

runfile('C:/Users/tom/hierarchical_clustering.py', wdir='C:/Users/tom')

=====================1 times Hierarical Clustring==================
init self.hc: [[[0], [1], [2], [3], [4], [5], [6]]]
preHC: [[0], [1], [2], [3], [4], [5], [6]]
dist:
 [[  inf  21.6  22.6  63.9  65.1  17.7  99.2]
 [  inf   inf   1.   42.3  43.5   3.9  77.6]
 [  inf   inf   inf  41.3  42.5   4.9  76.6]
 [  inf   inf   inf   inf   1.2  46.2  35.3]
 [  inf   inf   inf   inf   inf  47.4  34.1]
 [  inf   inf   inf   inf   inf   inf  81.5]
 [  inf   inf   inf   inf   inf   inf   inf]]
[[0], [1], [2], [3], [4], [5], [6]]
[[0], [1, 2], [3], [4], [5], [6]]


=====================2 times Hierarical Clustring==================
preHC: [[0], [1, 2], [3], [4], [5], [6]]
dist:
 [[  inf  22.1  63.9  65.1  17.7  99.2]
 [  inf   inf  41.8  43.    4.4  77.1]
 [  inf   inf   inf   1.2  46.2  35.3]
 [  inf   inf   inf   inf  47.4  34.1]
 [  inf   inf   inf   inf   inf  81.5]
 [  inf   inf   inf   inf   inf   inf]]
[[0], [1], [2], [3], [4], [5], [6]]
[[0], [1, 2], [3], [4], [5], [6]]
[[0], [1, 2], [3, 4], [5], [6]]


=====================3 times Hierarical Clustring==================
preHC: [[0], [1, 2], [3, 4], [5], [6]]
dist:
 [[  inf  22.1  64.5  17.7  99.2]
 [  inf   inf  42.4   4.4  77.1]
 [  inf   inf   inf  46.8  34.7]
 [  inf   inf   inf   inf  81.5]
 [  inf   inf   inf   inf   inf]]
[[0], [1], [2], [3], [4], [5], [6]]
[[0], [1, 2], [3], [4], [5], [6]]
[[0], [1, 2], [3, 4], [5], [6]]
[[0], [1, 2, 5], [3, 4], [6]]


=====================4 times Hierarical Clustring==================
preHC: [[0], [1, 2, 5], [3, 4], [6]]
dist:
 [[         inf  20.63333333  64.5         99.2       ]
 [         inf          inf  43.86666667  78.56666667]
 [         inf          inf          inf  34.7       ]
 [         inf          inf          inf          inf]]
[[0], [1], [2], [3], [4], [5], [6]]
[[0], [1, 2], [3], [4], [5], [6]]
[[0], [1, 2], [3, 4], [5], [6]]
[[0], [1, 2, 5], [3, 4], [6]]
[[0, 1, 2, 5], [3, 4], [6]]


=====================5 times Hierarical Clustring==================
preHC: [[0, 1, 2, 5], [3, 4], [6]]
dist:
 [[    inf  49.025  83.725]
 [    inf     inf  34.7  ]
 [    inf     inf     inf]]
[[0], [1], [2], [3], [4], [5], [6]]
[[0], [1, 2], [3], [4], [5], [6]]
[[0], [1, 2], [3, 4], [5], [6]]
[[0], [1, 2, 5], [3, 4], [6]]
[[0, 1, 2, 5], [3, 4], [6]]
[[0, 1, 2, 5], [3, 4, 6]]


=====================6 times Hierarical Clustring==================
preHC: [[0, 1, 2, 5], [3, 4, 6]]
succeed hierarical clustring:

[[0], [1], [2], [3], [4], [5], [6]]
[[0], [1, 2], [3], [4], [5], [6]]
[[0], [1, 2], [3, 4], [5], [6]]
[[0], [1, 2, 5], [3, 4], [6]]
[[0, 1, 2, 5], [3, 4], [6]]
[[0, 1, 2, 5], [3, 4, 6]]

(end)

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

형태소 분석은 데스크톱을 구성하는 데 도움이?

문자×기계 학습에 흥미를 가져와 개인 범위의 용도를 생각해, 폴더 정리에 사용할 수 있을까 생각해 검토를 시작했습니다. 이번 검토에서는 폴더 구성 & text의 읽기 → mecab × wordcloud를 실시하고 있...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

자바스크립트의 범위(ES6)

링크 의 이해

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다