Business Analytics II Summary

Regression Analysis

Linear Regression

Regression Assumptions

  1. No Multicollinearity
  • Variance Inflation Factor (VIF) < 10
  1. Homoskedasticity
  • Linearity : Residual Distribution
  • Breusch-Pagan Test : if p-value > 0.05, samples have homoskedasticity.
  • Correction: sm.OLS(y, x).fit(cov_type="HC3")
  1. Normality of Error
  • QQ Plot
  • Normality Tests: if p-value > 0.05, samples are normally distributed.
    (Kolmogorov-Smirnov, Shapiro-Wilk, Jarque-Bera etc.)

Non-Linear Regression

  1. Logistic Regression
  2. Probit Regression

Machine Learning

Decision Tree

Confusion Matrix

Predicted (y=1)Predicted (y=0)
True (y=1)True PositiveFalse Negative
(Type II Error)
False (y=0)False Positive
(Type I Error)
True Negative
- Accuracy  = (TP + TN) / Total
- Precision = TP / (TP + FP)
- Recall    = TP / (TP + FN)
- F1 Score  = 2 (Precision x Recall) / (Precision + Recall)

Random Forest

  • Ensemble learning method: a multitude of decision trees
  1. Data Preprocessing (Encoding, Categorizing, Normalizing, Scaling)
  2. Balancing Dataset (Up/Down Sampling)
  3. Defining Variables (Dependent/Independent)
  4. Modeling (Supervised Learning) & Cross Validation
  5. Evaluation (Accuracy Scores, Feature Importances)

Neural Networks

MLPClassifier(activation='relu', hidden_layer_sizes=10, max_iter=100)

Support Vector Machine

  1. Linear SVM
    SVC(kernel='linear')
  2. Non-linear SVM
  • Kernel: Polynomial('poly'), Gaussian: Radial Basis Fuction('rbf'), Sigmoid('sigmoid')

Naive Bayes

GaussianNB()

K-Nearest Neighbor

KNeighborsClassifier(n_neighbors=10)

좋은 웹페이지 즐겨찾기