기계 학습 초보자가 RBM을 시도했습니다.

16652 단어 파이썬 Python3 scikit-learn 기계 학습

처음에...

Qiita 초보자 투고입니다.
보기 어렵다...라든지 이런 내용 누구나 할 수 있어! 그렇다면 용서해주십시오 ...
어디까지나, 기계 학습이나 인공 지능의 지식 제로씨가 실시한 것입니다. (처음은, 에포크는 무엇? Accuracy는 무엇?라고 상태였습니다 ww)

RBM이란?

RBM (Restricted Boltzmann machine)
Deep Learning의 사전 학습(Pre Training)법의 일종으로, 잘 이름을 듣는 AutoEncoder와 쌍방을 이루는 모델의 1종입니다. 통계역학에 가장자리를 갖고 1984년~1986년에 모델이 고안되었습니다. 입력을 받아 출력이 결정론적(deterministic)으로 정해지는 Autoencoder와는 달리, 논의를 확률 분포상에서 실시할 수 있는 생성 모델이기 때문에, 편리성이 높은 모델로서 알려져 있습니다.

RBM에서 생각하는 Deep Learning ~흑마술을 더해~

잘 모르겠습니다 ...

여러가지 조사했습니다!

그렇다면 ...
scikit learn 에 BernoulliRBM 되는 것이 존재하고 있다!
할 수 없어.

구현

우선, MNIST라는 데이터 세트로 실행!

RBM.ipynb

import numpy as np
import matplotlib.pyplot as plt

from scipy.ndimage import convolve
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import Pipeline
from sklearn.base import clone

# 使用したデータセット
from keras.datasets import fashion_mnist
from keras.datasets import mnist

from keras.layers import Input, Dense
from keras.models import Model
from keras import layers, models

import time
import numpy

우선 import는 위와 같은 느낌

다음은 예제에 표시된대로 수행됩니다 (어떻게 처리하는지 모르는 부분이 많습니다)

RBM.ipynb

def nudge_dataset(X, Y):
    """
    This produces a dataset 5 times bigger than the original one,
    by moving the 8x8 images in X around by 1px to left, right, down, up
    """
    direction_vectors = [
        [[0, 1, 0],
         [0, 0, 0],
         [0, 0, 0]],

        [[0, 0, 0],
         [1, 0, 0],
         [0, 0, 0]],

        [[0, 0, 0],
         [0, 0, 1],
         [0, 0, 0]],

        [[0, 0, 0],
         [0, 0, 0],
         [0, 1, 0]]]

    def shift(x, w):
        return convolve(x.reshape((8, 8)), mode='constant', weights=w).ravel()

    X = np.concatenate([X] +
                       [np.apply_along_axis(shift, 1, X, vector)
                        for vector in direction_vectors])
    Y = np.concatenate([Y for _ in range(5)], axis=0)
    return X, Y


# Load Data
# (x_train, y_train), (x_test, y_test) = mnist.load_data()
# X, y = mnist.load_data()
# X = np.asarray(X, 'float32')
# X, Y = nudge_dataset(X, y)
# X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001)  # 0-1 scaling

# X_train, X_test, Y_train, Y_test = train_test_split(
#     X, Y, test_size=0.2, random_state=0)

# (X_train, Y_train), (X_test, Y_test) = fashion_mnist.load_data()
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.
X_train = X_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
X_test = X_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# Models we will use
logistic = linear_model.LogisticRegression(solver='newton-cg', tol=1)
rbm = BernoulliRBM(random_state=0, verbose=True)

rbm_features_classifier = Pipeline(
    steps=[('rbm', rbm), ('logistic', logistic)])

# #############################################################################
# Training

# Hyper-parameters. These were set by cross-validation,
# using a GridSearchCV. Here we are not performing cross-validation to
# save time.
rbm.learning_rate = 0.06
rbm.n_iter = 10
# More components tend to give better prediction performance, but larger
# fitting time
rbm.n_components = 100
logistic.C = 6000

# Training RBM-Logistic Pipeline
rbm_features_classifier.fit(X_train, Y_train)

# Training the Logistic regression classifier directly on the pixel
raw_pixel_classifier = clone(logistic)
raw_pixel_classifier.C = 100.
raw_pixel_classifier.fit(X_train, Y_train)

# #############################################################################
# Evaluation

Y_pred = rbm_features_classifier.predict(X_test)
print("Logistic regression using RBM features:\n%s\n" % (
    metrics.classification_report(Y_test, Y_pred)))

Y_pred = raw_pixel_classifier.predict(X_test)
print("Logistic regression using raw pixel features:\n%s\n" % (
    metrics.classification_report(Y_test, Y_pred)))

소스코드 더러워서 죄송합니다...
코멘트 아웃 지우지 않으면 ... 깨끗한 코드를 쓰는 것은 어렵습니다 ...

sklearn 라이브러리의 BernoulliRBM을 사용하고 있습니다. 이 예에서는 BernoulliRBM의 특징 추출기와 LogisticRegression 분류기를 사용하여 분류 파이프라인을 구축합니다. 비교를 위해 원시 픽셀 값에 대한 로지스틱 회귀를 제시합니다.

RBM 소스 코드 (sklearn)

결과

MNIST...Accuracy: 0.97 Fashion-MNIST...Accuracy: 0.79

에포크 수를 올리면 상당히 시간이 걸리기 때문에 에포크 수는 10으로 설정 (과제 제출까지 시간 내지 ...)

결론

소스 코드를 복사해 구현해 보았지만, 찾아내는데 번거로웠다...더 이해하지 않으면! 라고 생각했다.
그리고는 그래프? 손실 함수등을 표시할 수 있으면 보다 알기 쉬울 것이라고 생각했다.

앞으로 조금씩 투고하겠습니다. docker와 웹 관계를 게시하고 싶습니까 ...
잘 부탁드립니다.

Reference

이 문제에 관하여(기계 학습 초보자가 RBM을 시도했습니다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/Hiroshi_Sakai_flat/items/4dac4ddb8e8c6fb1c9d0

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

sklearn.metrics.ConfusionMatrixDisplay를 사용한 혼합 행렬 시각화

scikit-learn 사용법 (1)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다