단일 회귀 데이터

6311 단어 Python

데이터 세트 읽기


데이터 세트에는 설명 변수와 대상 변수가 포함됩니다.
설명 변수: 어떤 원인을 초래하는 변수
목적 변수: 그 원인으로 인한 변수
import numpy as np
from sklearn import datasets

boston = datasets.load_boston()
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
boston_df["PRICE"] = boston.target  # target: 目的変数
boston_df.head()
결과 1
    CRIM    ZN  INDUS   CHAS    NOX RM  AGE DIS RAD TAX PTRATIO B   LSTAT   PRICE
0   0.00632 18.0    2.31    0.0 0.538   6.575   65.2    4.0900  1.0 296.0   15.3    396.90  4.98    24.0
1   0.02731 0.0 7.07    0.0 0.469   6.421   78.9    4.9671  2.0 242.0   17.8    396.90  9.14    21.6
2   0.02729 0.0 7.07    0.0 0.469   7.185   61.1    4.9671  2.0 242.0   17.8    392.83  4.03    34.7
3   0.03237 0.0 2.18    0.0 0.458   6.998   45.8    6.0622  3.0 222.0   18.7    394.63  2.94    33.4
4   0.06905 0.0 2.18    0.0 0.458   7.147   54.2    6.0622  3.0 222.0   18.7    396.90  5.33    36.2

설명 변수 는 주택 의 특징 이고 목적 변수 는 가격 이다
나는 라벨의 뜻을 모르니 설명해 주세요.

설명

print(boston.DESCR)
결과 1

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   

.. topic:: References

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

무슨 말인지 알겠어요.
데이터 세트의 자세한 내용을 보고 싶으므로 다양한 정보를 보여 주십시오.
boston_df.describe()

CRIM    ZN  INDUS   CHAS    NOX RM  AGE DIS RAD TAX PTRATIO B   LSTAT   PRICE
count   506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000  506.000000
mean    3.613524    11.363636   11.136779   0.069170    0.554695    6.284634    68.574901   3.795043    9.549407    408.237154  18.455534   356.674032  12.653063   22.532806
std 8.601545    23.322453   6.860353    0.253994    0.115878    0.702617    28.148861   2.105710    8.707259    168.537116  2.164946    91.294864   7.141062    9.197104
min 0.006320    0.000000    0.460000    0.000000    0.385000    3.561000    2.900000    1.129600    1.000000    187.000000  12.600000   0.320000    1.730000    5.000000
25% 0.082045    0.000000    5.190000    0.000000    0.449000    5.885500    45.025000   2.100175    4.000000    279.000000  17.400000   375.377500  6.950000    17.025000
50% 0.256510    0.000000    9.690000    0.000000    0.538000    6.208500    77.500000   3.207450    5.000000    330.000000  19.050000   391.440000  11.360000   21.200000
75% 3.677083    12.500000   18.100000   0.000000    0.624000    6.623500    94.075000   5.188425    24.000000   666.000000  20.200000   396.225000  16.955000   25.000000
max 88.976200   100.000000  27.740000   1.000000    0.871000    8.780000    100.000000  12.126500   24.000000   711.000000  22.000000   396.900000  37.970000   50.000000

from sklearn.model_selection import train_test_split

# 訓練データとテストデータに分割
x_train, x_test, t_train, t_test = train_test_split(boston.data, boston.target, random_state=0) 
from sklearn import linear_model

# RM(部屋数)の列を取得
x_rm_train = x_train[:, [5]]
x_rm_test = x_test[:, [5]]

model = linear_model.LinearRegression() # 線形回帰モデル
model.fit(x_rm_train, t_train)  # モデルの訓練
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
a = model.coef_ # 係数
b = model.intercept_ # 切片
print("a: ", a) 
print("b: ", b) 
a:  [9.31294923]
b:  -36.180992646339206
import matplotlib.pyplot as plt

plt.scatter(x_rm_train, t_train, label="Train")
plt.scatter(x_rm_test, t_test, label="Test")

y_reg = a * x_rm_train + b  # 回帰直線
plt.plot(x_rm_train, y_reg, c="red") 

plt.xlabel("Rooms")
plt.ylabel("Price")
plt.legend()
plt.show()

라인 느낌 좋아, 느낌 좋아
다음까지 계속

좋은 웹페이지 즐겨찾기