Maskinlæring - Standardafvigelse

What is the standard deviation?

Standard deviation (Standard Deviation, also known as mean square) is a number that describes the degree of dispersion of the values.

A low standard deviation indicates that most numbers are close to the mean (the average).

A high standard deviation indicates that these values are distributed over a wider range.

For example: this time we have registered the speeds of 7 cars:

speed = [86,87,88,86,87,85,86]

The standard deviation is:

0.9

This means that most values are within 0.9 of the mean, that is, 86.4.

Let's deal with a wider range of numbers:

speed = [32,111,138,28,59,77,97]

The standard deviation is:

37.85

This means that most values are within 37.85 of the mean (which is 77.4).

As you can see, a higher standard deviation indicates that these values are distributed over a wider range.

The NumPy module has a method for calculating the standard deviation:

Eksempel

Use NumPy std() Method to find the standard deviation:

import numpy
speed = [86,87,88,86,87,85,86]
x = numpy.std(speed)
print(x)

Kør eksempel

Eksempel

import numpy
speed = [32,111,138,28,59,77,97]
x = numpy.std(speed)
print(x)

Kør eksempel

Variance

Variance is another number that indicates the degree of dispersion of the values.

In fact, if you take the square root of the variance, you will get the standard deviation!

Or vice versa, if you multiply the standard deviation by itself, you will get the variance!

To calculate the variance, you must perform the following operations:

1. Calculate the mean:

(32+111+138+28+59+77+97) / 7 = 77.4

2. For each value: find the difference from the average:

 32 - 77.4 = -45.4
111 - 77.4 =  33.6
138 - 77.4 =  60.6
 28 - 77.4 = -49.4
 59 - 77.4 = -18.4
 77 - 77.4 = - 0.4
 97 - 77.4 =  19.6

3. For each difference: find the square value:

(-45.4)2 = 2061.16 
 (33.6)2 = 1128.96 
 (60.6)2 = 3672.36 
(-49.4)2 (-49.4) 
= 2440.362 (-18.4) 
=  338.562 (- 0.4) 
 =    0.162 (19.6)

= 384.16

4. Variansen er gennemsnittet af disse kvadrerede forskelle:

(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.16) / 7 = 1432.2

Eksempel

Heldigvis har NumPy en metode til at beregne variansen: var() Metode til at bestemme variansen:

import numpy
speed = [32,111,138,28,59,77,97]
x = numpy.var(speed)
print(x)

Kør eksempel

Standardafvigelse

Som vi ved, er formlen til at beregne standardafvigelse kvadratroden af variansen:

√ 1432.25 = 37.85

Eller, som i det foregående eksempel, brug NumPy til at beregne standardafvigelse:

Eksempel

Brug NumPy std() metoden til at finde standardafvigelse:

import numpy
speed = [32,111,138,28,59,77,97]
x = numpy.std(speed)
print(x)

Kør eksempel

Symbol

Standardafvigelse repræsenteres ofte med Sigma symbolσ

Varians er ofte repræsenteret ved Sigma Square symbol σ2 Repræsenterer

Kapitelopsummering

Standardafvigelse og varians er ofte brugte termer i maskinlæring, så det er vigtigt at forstå, hvordan man får dem og de koncepter, der ligger bag dem.