감독된 ML을 사용한 예측

학습 시간을 기준으로 학생의 점수 백분율을 예측합니다.

이것은 변수가 2개뿐이므로 간단한 선형 회귀 작업입니다.

데이터는 clickhere에서 찾을 수 있습니다.

R, Python, SAS Enterprise Miner 또는 기타 도구를 사용할 수 있습니다.

학생이 하루 9.25시간 공부하면 예상 점수는 어떻게 되나요?

데모

감독 머신 러닝을 사용한 예측

이 회귀 작업에서 저는 학생이 공부한 시간을 기준으로 예상되는 점수의 백분율을 예측하려고 했습니다.

이것은 두 개의 변수만 포함하는 간단한 선형 회귀 작업입니다.

필요한 라이브러리 가져오기

# Importing the required libraries
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

소스에서 데이터 읽기

# Reading data from remote link
url = "https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv"
s_data = pd.read_csv(url)
print("Data import successful")
s_data.head(10)

2단계 - 입력 데이터 시각화

# Plotting the distribution of scores
s_data.plot(x='Hours', y='Scores', style='o')  
plt.title('Hours vs Percentage')  
plt.xlabel('Hours Studied')  
plt.ylabel('Percentage Score')  
plt.show()

그래프에서 우리는 공부한 시간과 점수 백분율 사이에 긍정적인 선형 관계를 안전하게 가정할 수 있습니다.

3단계 - 데이터 전처리

이 단계에서는 데이터를 "속성"(입력) 및 "레이블"(출력)로 구분했습니다.

X = s_data.iloc[:, :-1].values  
y = s_data.iloc[:, 1].values

4단계 - 모델 교육

데이터를 교육 및 테스트 세트로 분할하고 알고리즘을 교육합니다.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) 
regressor = LinearRegression()  
regressor.fit(X_train.reshape(-1,1), y_train) 

print("Training complete.")

5단계 - 회귀선 그리기

이제 모델이 학습되었으므로 가장 적합한 회귀선을 시각화할 때입니다.

# Plotting the regression line
line = regressor.coef_*X+regressor.intercept_

# Plotting for the test data
plt.scatter(X, y)
plt.plot(X, line,color='red');
plt.show()

6단계 - 예측하기

이제 알고리즘을 학습했으므로 몇 가지 예측을 통해 모델을 테스트할 차례입니다.

이를 위해 테스트 세트 데이터를 사용합니다.

# Testing data
print(X_test)
# Model Prediction 
y_pred = regressor.predict(X_test)

7단계 - 실제 결과와 예측 모델 결과 비교

# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred}) 
df

#Estimating training and test score
print("Training Score:",regressor.score(X_train,y_train))
print("Test Score:",regressor.score(X_test,y_test))

실제 값과 예측 값의 차이를 나타내는 막대 그래프 그리기

# Plotting the Bar graph to depict the difference between the actual and predicted value

df.plot(kind='bar',figsize=(5,5))
plt.grid(which='major', linewidth='0.5', color='red')
plt.grid(which='minor', linewidth='0.5', color='blue')
plt.show(

자체 데이터로 모델 테스트

# Testing the model with our own data
hours = 9.25
test = np.array([hours])
test = test.reshape(-1, 1)
own_pred = regressor.predict(test)
print("No of Hours = {}".format(hours))
print("Predicted Score = {}".format(own_pred[0]))

8단계 - 모델 평가

마지막 단계는 알고리즘의 성능을 평가하는 것입니다. 이 단계는 서로 다른 알고리즘이 특정 데이터 세트에서 얼마나 잘 수행되는지 비교하는 데 특히 중요합니다. 여기에서 모델 성능을 비교하고 정확도를 예측하기 위해 다양한 오류가 계산되었습니다.

from sklearn import metrics  
print('Mean Absolute Error:',metrics.mean_absolute_error(y_test, y_pred)) 
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('R-2:', metrics.r2_score(y_test, y_pred))

평균 절대 오차: 4.183859899002975
평균 제곱 오차: 21.598769307217406
평균 제곱근 오차: 4.647447612100367
R-2: 0.9454906892105355
R-2는 모델 적합도 점수를 제공하며, 이 경우 R-2 = 0.9454906892105355이며 이는 실제로 이 모델에 대한 훌륭한 점수입니다.

감독된 ML 작업을 사용하여 예측을 성공적으로 수행할 수 있었고 다양한 매개변수에서 모델의 성능을 평가할 수 있었습니다.

Reference

이 문제에 관하여(감독된 ML을 사용한 예측), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/yaswanthteja/prediction-using-supervised-ml-1d9d

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다