Machine Learning - Standard Deviation

What is the standard deviation?

Standard deviation (Standard Deviation, also known as mean square) is a number that describes the degree of dispersion of the values.

A low standard deviation indicates that most numbers are close to the mean (the average).

A high standard deviation indicates that these values are distributed over a wider range.

For example: this time we have registered the speeds of 7 cars:

speed = [86,87,88,86,87,85,86]

The standard deviation is:

0.9

This means that most values are within 0.9 of the mean, that is, 86.4.

Let's deal with a wider range of numbers:

speed = [32, 111, 138, 28, 59, 77, 97]

The standard deviation is:

37.85

This means that most values are within 37.85 of the mean (which is 77.4).

As you can see, a higher standard deviation indicates that these values are distributed over a wider range.

The NumPy module has a method to calculate the standard deviation:

Example

Please use NumPy std() Method to find the standard deviation:

import numpy
speed = [86,87,88,86,87,85,86]
x = numpy.std(speed)
print(x)

Running Instance

Example

import numpy
speed = [32, 111, 138, 28, 59, 77, 97]
x = numpy.std(speed)
print(x)

Running Instance

Variance

Variance is another number that indicates the degree of dispersion of the values.

In fact, if you take the square root of the variance, you will get the standard deviation!

Or conversely, if you multiply the standard deviation by itself, you will get the variance!

To calculate the variance, you must perform the following operations:

1. Calculate the mean:

(32+111+138+28+59+77+97) / 7 = 77.4

For each value: find the difference from the average:

 32 - 77.4 = -45.4
111 - 77.4 =  33.6
138 - 77.4 =  60.6
 28 - 77.4 = -49.4
 59 - 77.4 = -18.4
 77 - 77.4 = - 0.4
 97 - 77.4 =  19.6

For each difference: find the square value:

(-45.4)2 = 2061.16 
 (33.6)2 = 1128.96 
 (60.6)2 = 3672.36 
(-49.4)2 = 2440.36 
(-18.4)2 = 338.56 
(- 0.4)2 = 0.16 
 (19.6)2 = 384.16

4. Variance is the average of these squared differences:

(2061.16 + 1128.96 + 3672.36 + 2440.36 + 338.56 + 0.16 + 384.16) / 7 = 1432.2

Fortunately, NumPy has a method for calculating variance:

Example

Using NumPy var() Method to determine variance:

import numpy
speed = [32, 111, 138, 28, 59, 77, 97]
x = numpy.var(speed)
print(x)

Running Instance

Standard Deviation

As we know, the formula for calculating the standard deviation is the square root of the variance:

√1432.25 = 37.85

Or, as shown in the example above, use NumPy to calculate the standard deviation:

Example

Please use the NumPy std() method to find the standard deviation:

import numpy
speed = [32, 111, 138, 28, 59, 77, 97]
x = numpy.std(speed)
print(x)

Running Instance

Symbol

Standard deviation is usually represented by the Sigma symbol:σ

Variance is usually represented by the Sigma Square symbol σ2 Representation

Chapter Summary

Standard deviation and variance are frequently used terms in machine learning, so it is very important to understand how to obtain them and the concepts behind them.