Machines Learning 学习笔记(Week6)
4434 단어 MachineLearning
Evaluating a Hypothesis
1. Model Selection
Break down our dataset into three sets:
- Traning set: 60%
- Cross Validation set: 20%
- Test set: 20%
Suppose we have several hypothesis functions with different polynomial degrees. To select the best model:
2. Test Set Error
J_{test}(Θ) = \frac{1}{2m_{test}} \sum_{i=1}^{m_{test}} \bigl(h_Θ(x_{test}^{(i)})-y_{test}^{(i)} \bigr)^2
err\bigl(h_Θ(x),y\bigr) =
\begin{array}{ll}
1 & if \, h_Θ(x)\geq0.5 \, and \, y=0 \, or \, h_Θ(x)\leq0.5 \, and \, y=1 \\
0 & otherwise
\end{array}
The average test error for the test set is:
Test Error = $\frac {1}{m_test}\sum_{i=1}^{m_{test}}err\bigl(h_Θ(x_{test}^{(i)}),y_{test}^{ (i)}\bigr)$
This gives us the proportion of the test data that was misclassified.
Bias vs. Variance
1. Degree of the Polynomial d and B/V
data:image/s3,"s3://crabby-images/2d2a0/2d2a01a7a14ff931dec0b58a0856b27dcbad189b" alt=""
2. Regularization and B/V
data:image/s3,"s3://crabby-images/c9597/c959747d60c124e854219ee2b17518dbf56d8fe8" alt=""
How to choose $\lambda$ :
3. Learning Curves
data:image/s3,"s3://crabby-images/8ab3f/8ab3fd736c286f4b03f2bcb75482f65d6e4941a5" alt=""
data:image/s3,"s3://crabby-images/86da3/86da34d29e8b031b0e1bffb4c61bfe7d26d420be" alt=""
4. What to Do Next to Improve
Our decision process can be broken down as follows:
Getting more training examples: Fixes high variance
Trying smaller sets of features: Fixes high variance
Adding features: Fixes high bias
Adding polynomial features: Fixes high bias
Decreasing λ: Fixes high bias
Increasing λ: Fixes high variance
오류 분석
Choose Error Metrics:
data:image/s3,"s3://crabby-images/09acb/09acb818cc84a1564892014d6afb38f868113956" alt=""
data:image/s3,"s3://crabby-images/48938/48938819ca141c92671380614c97f6652dca9bc6" alt=""
*Precision, Recall and F1 Score are good metrics paricularly when dealing with skewed data.
Using large data sets usually helps!
data:image/s3,"s3://crabby-images/6915b/6915b802ec1bf61ad6d364958a172827639eb755" alt=""
It’s not who has the best algorithm that wins.
It’s who has the most data.
Reference
이 문제에 관하여(Machines Learning 学习笔记(Week6)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/d114ddc0a6248333688e텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)