scikit-learn에 의해 iris(아야메) 문제를 풀다 ver1.0(로지스틱 회귀)

15170 단어 파이썬 scikit-learn 기계 학습 numpy

1. 소개

기계 학습을 배우는데 있어서의 튜토리얼로서 여러분 반드시 지나가는 길이 될 iris(아야메)의 명칭 예측에 대해, 제가 실시한 방법을 비망으로 기록합니다.

사용한 버전은 여기입니다.

Python 3.7.6

numpy 1.18.1
- pandas 1.0.1

matplotlib 3.1.3

seaborn 0.10.0

scikit-learn 0.22.1

2. 아야메의 분류란?

2-1 아야메 문제 개요

"setosa", "versicolor", "virginica"라고 불리는 3종류의 품종이 있습니다. 이 아야메의 화관(하나비라 전체)을 나타내는 데이터로서, 가쿠편(Sepal), 꽃잎(Petal)의 폭 및 길이가 있습니다.
이 4가지 특징으로부터 3종류의 꽃의 이름을 도출하는 것이 이번 문제가 됩니다.

2-2 프로그램에 대해서

라이브러리 등 가져오기


import numpy as np
import pandas as pd
from pandas import Series,DataFrame

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

%matplotlib inline

from sklearn.datasets import load_iris

이번에는 numpy, pandas, matplotlib, seaborn 및 sklearn을 읽고 있습니다.
iris의 데이터 세트는 sklearn.datasets 내에서 읽었습니다.

데이터를 살펴보기


iris_data = DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal, Width'])
iris_data

데이터 수는 150이었습니다. 또, 엽서(Sepal), 꽃잎(Petal)의 폭 및 길이가 아마 cm 단위로 기재되어 있습니다.
　

다음으로 꽃의 종류를 살펴 보겠습니다.


iris_target = DataFrame(y, columns =['Species'])
iris_target

꽃의 이름이 아니고, 벌써 수치로서 종류가 채워지고 있는 것을 알 수 있습니다. 이대로의 처리에서도 OK입니다만, 수치와 이름의 대응을 스스로 기억해 두어야 하는 등의 번거로움도 나오므로, 이름에 대응시켜 둡니다.


#名前を付ける関数を定義
def flower(num):
    if num ==0:
        return 'Setosa'
    elif num == 1:
        return 'Veriscolour'
    else:
        return 'Virginica'
iris_target['Species'] = iris_target['Species'].apply(flower)
iris_target

　이것으로 이름이 지정되어 있기 때문에, 알기 쉬워졌군요.

변수별 상관 확인


iris = pd.concat
([iris_data, iris_target], axis=1)
sns.pairplot(iris, hue='Species',hue_order=['Virginica', 'Veriscolour', 'Setosa'], size=2,palette="husl")

각 변수에 대한 상관 관계를 플로팅합니다. seaborn의 pairplot 메소드를 사용하면 한 줄로 작성할 수 있습니다.
이와 같이 보면, Setosa는 다른 2개에 비해 특징적인 차이가 있는 것을 알 수 있습니다. 반면에 Virginica와 Veriscolour는 Sepal Length가 비슷한 곳에 위치하고 있습니다.

실제 꽃의 모습을 보면 확실히 전체적으로 크기가 작은 꽃이 Setosa인 것을 알 수 있습니다.

2-3 로지스틱 회귀를 이용한 예측


# LogisticRegressionのインポート
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
logreg = LogisticRegression()

# テストデータは全体のうち3割を用いることとしました。
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.3, random_state=3)
logreg.fit(x_train, y_train)

# 正解率(accuracy_score）を出すための関数
from sklearn import metrics
y_pred  =logreg.predict(x_test)
metrics.accuracy_score(y_test, y_pred)

정답률: 0.97777777777777777

이번에는 로지스틱 회귀를 이용하여 해석을 실시했습니다. 로지스틱 회귀가 목적 변수가 0 또는 1의 2치가 되는 회귀를 말합니다. 즉, 「진짜」인가 「가짜인가」나 「양성」인가 「악성」인지 등을 판별하기 위한 수단이 됩니다.

이번 케이스에서는 3가지로 나누는 수법에 적용했습니다. 3개 이상의 다중 클래스에 대해서도 로지스틱 회귀를 적용할 수 있습니다. 그 적용의 이미지입니다만, 하기 화상과 같이 다변수라도 2변수와 같이 나누어 계산을 실시하고 있습니다.
　

이번 케이스에서는 97.8%의 정답률이 되었습니다. 이 방법으로 좋을 것 같은 것을 알 수 있습니다.
　

참고 URL

htps : // v.ぁsss d. jp / 마을 - r r g / ぉギ s c rg
h tp // w w. m해. 이. jp / 누오 pt / cs / v20 / 에 mp ぇ s / HTML / 02-18-00. HTML

3. 프로그램 전문


import numpy as np
import pandas as pd
from pandas import Series,DataFrame

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

%matplotlib inline

from sklearn.datasets import load_iris
iris = load_iris()
x =iris.data
y=iris.target

iris_data = DataFrame(x, columns=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal, Width'])
iris_target = DataFrame(y, columns =['Species'])

def flower(num):
    if num ==0:
        return 'Setosa'
    elif num == 1:
        return 'Veriscolour'
    else:
        return 'Virginica'

iris_target['Species'] = iris_target['Species'].apply(flower)

iris = pd.concat([iris_data, iris_target], axis=1)

# ロジスティック回帰をインポート
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

logreg = LogisticRegression()
x_train, x_test, y_train, y_test =train_test_split(x,y,test_size=0.3, random_state=3)

logreg.fit(x_train, y_train)

from sklearn import metrics
y_pred  =logreg.predict(x_test)


metrics.accuracy_score(y_test, y_pred)

Reference

이 문제에 관하여(scikit-learn에 의해 iris(아야메) 문제를 풀다 ver1.0(로지스틱 회귀)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/Fumio-eisan/items/b7f1033cf9a0a208aeb4

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

Iris 데이터를 사용하여 Autoencoder를 사용해 보았습니다.

환율 시계열 데이터를 클러스터링 한 이야기

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다