Maskinlæring - Standardafvigelse
- Forrige side Gennemsnitlig median mode
- Næste side Procentpoint
What is the standard deviation?
Standard deviation (Standard Deviation, also known as mean square) is a number that describes the degree of dispersion of the values.
A low standard deviation indicates that most numbers are close to the mean (the average).
A high standard deviation indicates that these values are distributed over a wider range.
For example: this time we have registered the speeds of 7 cars:
speed = [86,87,88,86,87,85,86]
The standard deviation is:
0.9
This means that most values are within 0.9 of the mean, that is, 86.4.
Let's deal with a wider range of numbers:
speed = [32,111,138,28,59,77,97]
The standard deviation is:
37.85
This means that most values are within 37.85 of the mean (which is 77.4).
As you can see, a higher standard deviation indicates that these values are distributed over a wider range.
The NumPy module has a method for calculating the standard deviation:
Eksempel
Use NumPy std()
Method to find the standard deviation:
import numpy speed = [86,87,88,86,87,85,86] x = numpy.std(speed) print(x)
Eksempel
import numpy speed = [32,111,138,28,59,77,97] x = numpy.std(speed) print(x)
Variance
Variance is another number that indicates the degree of dispersion of the values.
In fact, if you take the square root of the variance, you will get the standard deviation!
Or vice versa, if you multiply the standard deviation by itself, you will get the variance!
To calculate the variance, you must perform the following operations:
1. Calculate the mean:
(32+111+138+28+59+77+97) / 7 = 77.4
2. For each value: find the difference from the average:
32 - 77.4 = -45.4 111 - 77.4 = 33.6 138 - 77.4 = 60.6 28 - 77.4 = -49.4 59 - 77.4 = -18.4 77 - 77.4 = - 0.4 97 - 77.4 = 19.6
3. For each difference: find the square value:
(-45.4)2 = 2061.16 (33.6)2 = 1128.96 (60.6)2 = 3672.36 (-49.4)2 (-49.4) = 2440.362 (-18.4) = 338.562 (- 0.4) = 0.162 (19.6)
= 384.16
4. Variansen er gennemsnittet af disse kvadrerede forskelle:
(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.16) / 7 = 1432.2
Eksempel
Heldigvis har NumPy en metode til at beregne variansen: var()
Metode til at bestemme variansen:
import numpy speed = [32,111,138,28,59,77,97] x = numpy.var(speed) print(x)
Standardafvigelse
Som vi ved, er formlen til at beregne standardafvigelse kvadratroden af variansen:
√ 1432.25 = 37.85
Eller, som i det foregående eksempel, brug NumPy til at beregne standardafvigelse:
Eksempel
Brug NumPy std() metoden til at finde standardafvigelse:
import numpy speed = [32,111,138,28,59,77,97] x = numpy.std(speed) print(x)
Symbol
Standardafvigelse repræsenteres ofte med Sigma symbolσ
Varians er ofte repræsenteret ved Sigma Square symbol σ2 Repræsenterer
Kapitelopsummering
Standardafvigelse og varians er ofte brugte termer i maskinlæring, så det er vigtigt at forstå, hvordan man får dem og de koncepter, der ligger bag dem.
- Forrige side Gennemsnitlig median mode
- Næste side Procentpoint