Machine Learning - Training/Testing

评估模型

在机器学习中,我们创建模型来预测某些事件的结果,就像在上一章中当我们了解重量和发动机排量时,预测了汽车的二氧化碳排放量一样。

Dona a koyar da yadda modela ake samu, ake samu samar da wuri ne kan 'tsara/samu'.

Gane da ake nufin 'tsara/samu'?

Tsara/samu naa yana nuna yadda ake samu jumla na saman modela.

Anan sunan 'tsara/samu' sabonin da ake naa a tsara ayyukanan dattan a matsayin ikoran biyu: ayyukanan tsara da ayyukanan samu.

80% dona a tsara, 20% dona a samu.

A zai zai samu ayyukanan dattan dona a tsara modela.

A zai zai samu ayyukanan tsara dona a tsara modela.

A tsara modela naa yana nuna yadda ake naa.

A tsara modela naa yana nuna yadda modela ta a samu.

Kamata ayyukanan dattan.

Kamata ayyukanan dattan a baya.

Ayyukanan dattan gida 100 kananan mutum da koyarwarin suwar sukan.

Instance

import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
plt.scatter(x, y)
plt.show()

Kauyawa:

x 轴表示购买前的分钟数。

y 轴表示在购买上花费的金额。


Run Instance

拆分训练/测试

训练集应该是原始数据的 80% 的随机选择。

测试集应该是剩余的 20%。

train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]

显示训练集

显示与训练集相同的散点图:

Instance

plt.scatter(train_x, train_y)
plt.show()

Kauyawa:

它看起来像原始数据集,因此似乎是一个合理的选择:


Run Instance

显示测试集

为了确保测试集不是完全不同,我们还要看一下测试集。

Instance

plt.scatter(test_x, test_y)
plt.show()

Kauyawa:

测试集也看起来像原始数据集:


Run Instance

拟合数据集

数据集是什么样的?我认为最合适拟合的是多项式回归,因此让我们画一条多项式回归线。

要通过数据点画一条线,我们使用 matplotlib 模块的 plott() 方法:

Instance

绘制穿过数据点的多项式回归线:

import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
myline = numpy.linspace(0, 6, 100)
plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()

Kauyawa:


Run Instance

Kauyawa za a iya su tuta aiki ga tsammanar wa data na maimakon kuma a modeli, kuma a yadda ake samu wani shakara. Misali: wani shakara yana nuna cewa wani mai shuka a kungiyar kwallon kafa 6 minitun, yana samu duki na 200. Idan ake samu wani shakara, ana nuna cewa yana da kusurci.

Kuma R-squared score kuma yana nuna wajibcin kuma data na maimakon kuma a modeli.

R2

Kai tsammana, R2, kuma ana kira R-squared (R-squared) ba?

Yana kai samu wajibcin kuma a tsakanin x axis da y axis, kuma yana nuna daga 0 zuwa 1, inda 0 ya nuna wajibcin kuma kuma 1 ya nuna wajibcin kuma.

sklearn modulun yana da shi ne rs_score() Kuma a halin yanzu, a za a samu yanke na kai tsammanar wa lokaci da a gana samu duki a kungiyar kwallon kafa.

A yanzu, a za a samu yanke na kai tsammanar wa lokaci da a gana samu duki a kungiyar kwallon kafa.

Instance

Bai kai tsammana ba, kuma a halin yanzu, data na taraya, kuma a gana samu kuma.

import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
r2 = r2_score(train_y, mymodel(train_x))
print(r2)

Run Instance

Tattalin Iya:Kauyawa 0.799 ya nuna wajibcin kuma.

A tsara setar tafiyar da aikata.

A yanzu, kuma a halin yanzu, a cikin data na taraya, a da shi modeli mai kyau.

koyi, aminu kada a gana samu kiyi, kuma a gana samu modeli don kawo shawararai da yadda su yi.

Instance

A gano R2 yadda zai iya amfani a cikin data na tsafta:

import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)
x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x
train_x = x[:80]
train_y = y[:80]
test_x = x[80:]
test_y = y[80:]
mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))
r2 = r2_score(test_y, mymodel(test_x))
print(r2)

Run Instance

Tattalin Iya:Nuna 0.809 tace na model na daidai a cikin tsafta, a kuma ganin a za a samun dukiya da zai iya daidai a samun dukiya da zai iya samun dukiya da yadda zai iya samun dukiya.

Dukiya

Baya na zama a ke ganin model na dukkanin a kai, a za a bida kai baya samun dukiya.

Instance

Kuma an ba ga wani mai ganin cewa masu daffa ke neman 5 minitun a dabbuka, mace ko dan baba zai wari kowace kuɗi?

print(mymodel(5))

Run Instance

Tarihin nuna an gana 22.88 dollar, kuma ya fi yadda shi yana fi yadda na tsafta: