Machine Learning - Linear Regression
- Previous Page Scatter Plot
- Next Page Polynomial Regression
Regression
The term 'regression' is used when you try to find relationships between variables.
This relationship is used to predict the results of future events in machine learning and statistical modeling.
Linear regression
Linear regression draws a straight line between all data points using the relationship between data points.
This line can be used to predict future values.

In machine learning, predicting the future is very important.
Working Principle
Python provides some methods to find the relationship between data points and draw linear regression lines. We will show you how to use these methods instead of through mathematical formulas.
In the following example, the x-axis represents the age of the car, and the y-axis represents the speed. We have recorded the age and speed of 13 cars passing through the toll station. Let's see if the data we have collected can be used for linear regression:
Example
First draw the scatter plot:
import matplotlib.pyplot as plt x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] plt.scatter(x, y) plt.show()
Result:

Example
import scipy
And draw the linear regression line:
import matplotlib.pyplot as plt from scipy import stats x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x): return slope * x + intercept mymodel = list(map(myfunc, x)) plt.scatter(x, y) plt.plot(x, mymodel) plt.show()
Result:

Example Explanation
Import the required modules:
import matplotlib.pyplot as plt from scipy import stats
Create an array representing the values on the x and y axes:
x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86]
Execute a method that returns some important key values of linear regression:
slope, intercept, r, p, std_err = stats.linregress(x, y)
Create a use slope
and intercept
The function of values returns new values. This new value represents the position of the corresponding x value on the y-axis:
def myfunc(x): return slope * x + intercept
Run each value of the x array through the function. This will produce a new array with new values on the y-axis:
mymodel = list(map(myfunc, x))
Plot the original scatter plot:
plt.scatter(x, y)
Plot the linear regression line:
plt.plot(x, mymodel)
Display the graph:
plt.show()
R-Squared
It is important to know how well the values on the x-axis and y-axis are related. If there is no relationship, linear regression cannot be used to predict anything.
This relationship is measured by a value called r-squared (r-squared).
The range of the coefficient of determination (r-squared) is from 0 to 1, where 0 indicates no correlation, and 1 indicates 100% correlation.
Python and the Scipy module will calculate this value for you. All you need to do is provide it with the x and y values:
Example
How well does my data fit in linear regression?
from scipy import stats x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] slope, intercept, r, p, std_err = stats.linregress(x, y) print(r)
Note:The result of -0.76 indicates that there is some relationship, but not a perfect one. However, it shows that we can use linear regression in future predictions.
Predict future values
Now, we can use the collected information to predict future values.
For example: let's try to predict the speed of a car with a 10-year history.
For this, we need the same as in the previous example: myfunc()
Function:
def myfunc(x): return slope * x + intercept
Example
Predict the speed of a car with a 10-year age:
from scipy import stats x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6] y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86] slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x): return slope * x + intercept speed = myfunc(10) print(speed)
The prediction speed is 85.6, and we can also read it from the figure:

Poor fit?
Let's create an instance where linear regression is not the best method for predicting future values.
Example
These values for the x and y axes will result in a very poor fit for linear regression:
import matplotlib.pyplot as plt from scipy import stats x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40] y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15] slope, intercept, r, p, std_err = stats.linregress(x, y) def myfunc(x): return slope * x + intercept mymodel = list(map(myfunc, x)) plt.scatter(x, y) plt.plot(x, mymodel) plt.show()
Result:

and r-squared value?
Example
You should have obtained a very low r-squared value.
import numpy from scipy import stats x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40] y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15] slope, intercept, r, p, std_err = stats.linregress(x, y) print(r)
Result: 0.013 indicates a very poor relationship and tells us that the dataset is not suitable for linear regression.
- Previous Page Scatter Plot
- Next Page Polynomial Regression