Machine Learning - Standard Deviation
- Previous Page Average Median Pattern
- Next Page Percentiles
What is the standard deviation?
Standard deviation (Standard Deviation, also known as mean square) is a number that describes the degree of dispersion of the values.
A low standard deviation indicates that most numbers are close to the mean (the average).
A high standard deviation indicates that these values are distributed over a wider range.
For example: this time we have registered the speeds of 7 cars:
speed = [86,87,88,86,87,85,86]
The standard deviation is:
0.9
This means that most values are within 0.9 of the mean, that is, 86.4.
Let's deal with a wider range of numbers:
speed = [32, 111, 138, 28, 59, 77, 97]
The standard deviation is:
37.85
This means that most values are within 37.85 of the mean (which is 77.4).
As you can see, a higher standard deviation indicates that these values are distributed over a wider range.
The NumPy module has a method to calculate the standard deviation:
Example
Please use NumPy std()
Method to find the standard deviation:
import numpy speed = [86,87,88,86,87,85,86] x = numpy.std(speed) print(x)
Example
import numpy speed = [32, 111, 138, 28, 59, 77, 97] x = numpy.std(speed) print(x)
Variance
Variance is another number that indicates the degree of dispersion of the values.
In fact, if you take the square root of the variance, you will get the standard deviation!
Or conversely, if you multiply the standard deviation by itself, you will get the variance!
To calculate the variance, you must perform the following operations:
1. Calculate the mean:
(32+111+138+28+59+77+97) / 7 = 77.4
For each value: find the difference from the average:
32 - 77.4 = -45.4 111 - 77.4 = 33.6 138 - 77.4 = 60.6 28 - 77.4 = -49.4 59 - 77.4 = -18.4 77 - 77.4 = - 0.4 97 - 77.4 = 19.6
For each difference: find the square value:
(-45.4)2 = 2061.16 (33.6)2 = 1128.96 (60.6)2 = 3672.36 (-49.4)2 = 2440.36 (-18.4)2 = 338.56 (- 0.4)2 = 0.16 (19.6)2 = 384.16
4. Variance is the average of these squared differences:
(2061.16 + 1128.96 + 3672.36 + 2440.36 + 338.56 + 0.16 + 384.16) / 7 = 1432.2
Fortunately, NumPy has a method for calculating variance:
Example
Using NumPy var()
Method to determine variance:
import numpy speed = [32, 111, 138, 28, 59, 77, 97] x = numpy.var(speed) print(x)
Standard Deviation
As we know, the formula for calculating the standard deviation is the square root of the variance:
√1432.25 = 37.85
Or, as shown in the example above, use NumPy to calculate the standard deviation:
Example
Please use the NumPy std() method to find the standard deviation:
import numpy speed = [32, 111, 138, 28, 59, 77, 97] x = numpy.std(speed) print(x)
Symbol
Standard deviation is usually represented by the Sigma symbol:σ
Variance is usually represented by the Sigma Square symbol σ2 Representation
Chapter Summary
Standard deviation and variance are frequently used terms in machine learning, so it is very important to understand how to obtain them and the concepts behind them.
- Previous Page Average Median Pattern
- Next Page Percentiles