Machines Learning 学习笔记(Week6)
4434 단어 MachineLearning
Evaluating a Hypothesis
1. Model Selection
Break down our dataset into three sets:
- Traning set: 60%
- Cross Validation set: 20%
- Test set: 20%
Suppose we have several hypothesis functions with different polynomial degrees. To select the best model:
2. Test Set Error
J_{test}(Θ) = \frac{1}{2m_{test}} \sum_{i=1}^{m_{test}} \bigl(h_Θ(x_{test}^{(i)})-y_{test}^{(i)} \bigr)^2
err\bigl(h_Θ(x),y\bigr) =
\begin{array}{ll}
1 & if \, h_Θ(x)\geq0.5 \, and \, y=0 \, or \, h_Θ(x)\leq0.5 \, and \, y=1 \\
0 & otherwise
\end{array}
The average test error for the test set is:
Test Error = $\frac {1}{m_test}\sum_{i=1}^{m_{test}}err\bigl(h_Θ(x_{test}^{(i)}),y_{test}^{ (i)}\bigr)$
This gives us the proportion of the test data that was misclassified.
Bias vs. Variance
1. Degree of the Polynomial d and B/V
2. Regularization and B/V
How to choose $\lambda$ :
3. Learning Curves
4. What to Do Next to Improve
Our decision process can be broken down as follows:
Getting more training examples: Fixes high variance
Trying smaller sets of features: Fixes high variance
Adding features: Fixes high bias
Adding polynomial features: Fixes high bias
Decreasing λ: Fixes high bias
Increasing λ: Fixes high variance
오류 분석
Choose Error Metrics:
*Precision, Recall and F1 Score are good metrics paricularly when dealing with skewed data.
Using large data sets usually helps!
It’s not who has the best algorithm that wins.
It’s who has the most data.
Reference
이 문제에 관하여(Machines Learning 学习笔记(Week6)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/CHrIs23436939/items/d114ddc0a6248333688e텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)