과제 3.2 다항식 단일 회귀 훈련 오차 및 테스트 오차

Youtube의 해설은 제4회(1) 40분당
$y =\cos(1.5\pi x)$에 $N(0,1)\times0.1$의 오차를 실은 30개의 훈련 데이터를 만들어 다항식 회귀를 한다.
여기에서 교차 검증이 들어간다.
1차부터 20차까지 순서대로 회귀해 간다.
훈련 데이터는 이것.

소스 코드

Homework_3.2.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures as PF
from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

DEGREE = 20

def true_f(x):
    return np.cos(1.5 * x * np.pi)

np.random.seed(0)
n_samples = 30

# 描画用のx軸データ
x_plot = np.linspace(0,1,100)
# 訓練データ
x_tr = np.sort(np.random.rand(n_samples))
y_tr = true_f(x_tr) + np.random.randn(n_samples) * 0.1
# Matrixへ変換
X_tr = x_tr.reshape(-1,1)
X_plot = x_plot.reshape(-1,1)

for degree in range(1,DEGREE+1):
    plt.scatter(x_tr,y_tr,label="Training Samples")
    plt.plot(x_plot,true_f(x_plot),label="True")
    plt.xlim(0,1)
    plt.ylim(-2,2)
    filename = f"{degree}.png"
    pf = PF(degree=degree,include_bias=False)
    linear_reg = linear_model.LinearRegression()
    steps = [("Polynomial_Features",pf),("Linear_Regression",linear_reg)]
    pipeline = Pipeline(steps=steps)
    pipeline.fit(X_tr,y_tr)
    plt.plot(x_plot,pipeline.predict(X_plot),label="Model")
    y_predict = pipeline.predict(X_tr)
    mse = mean_squared_error(y_tr,y_predict)
    scores = cross_val_score(pipeline,X_tr,y_tr,scoring="neg_mean_squared_error",cv=10)
    plt.title(f"Degree: {degree} TrainErr: {mse:.2e} TestErr: {-scores.mean():.2e}(+/- {scores.std():.2e})")
    plt.legend()
    plt.savefig(filename)
    plt.clf()

전회의 과제 3.1에서는 PolynomialFeatures에서 $x,x^2,x^3$등을 준비하고 나서, LinearRegression을 실시하고 있었지만, pipeline이라고 하는 것을 사용하면 일발로 할 수 있는 것을 배웠다.
실제로 과제 3.1의 해설 동영상 속의 소스 코드를 보면 pipeline을 사용하고 있었다.
아무것도 어려운 일은 없고, steps 로 처리 내용을 열거하는 것 뿐이다.

steps = [("Polynomial_Features",pf),("Linear_Regression",linear_reg)]
pipeline = Pipeline(steps=steps)
pipeline.fit(X_tr,y_tr)

이 부분 이외에서 과제 3.1과 다른 것은 교차 검증이 들어 있다는 것이다.
프로그램으로 말하면 이 부분.

scores = cross_val_score(pipeline,X_tr,y_tr,scoring="neg_mean_squared_error",cv=10)

cv=10 에서 데이터를 10분할하고 나서 1부분을 테스트 데이터로 하여 테스트 오차를 평가하고 있다.
기본적으로이 테스트 오차가 작은 것이 우수합니다.
프로그램을 실행하면 1.png - 20.png까지 20개의 그래프 파일이 작성된다.

훈련 오차가 가장 작은 것 = 차수가 20

테스트 오차가 가장 작은 것 = 차수가 3

여기에서 어떻게 과학습이 안된다는 것을 알 수 있다.

Reference

이 문제에 관하여(쓰쿠바 대학의 기계 학습 강좌 : 과제의 파이썬 스크립트 부분을 만들면서 sklearn 공부 (3)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/legacyworld/items/c9cd7865efb343c42e5e

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다