Machine Learning - Polynomial Regression
- Previous Page Linear Regression
- Next Page Multiple Regression
Polynomial Regression
If your data points are clearly not suitable for linear regression (a straight line passing through the data points), polynomial regression may be the ideal choice.
Like linear regression, polynomial regression uses the relationship between variables x and y to find the best method to draw a line through the data points.

Working principle
Python has some methods to find the relationship between data points and draw a polynomial regression line. We will show you how to use these methods instead of through mathematical formulas.
In the following example, we registered 18 cars passing through a specific toll station.
We have recorded the speed and passing time (hours) of the cars.
The x-axis represents the hour of the day, and the y-axis represents the speed:
Example
First draw the scatter plot:
import matplotlib.pyplot as plt x = [1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 21, 22] y = [100, 90, 80, 60, 60, 55, 60, 65, 70, 70, 75, 76, 78, 79, 90, 99, 99, 100] plt.scatter(x, y) plt.show()
Result:

Example
import numpy
and matplotlib
Then draw the polynomial regression line:
import numpy import matplotlib.pyplot as plt x = [1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 21, 22] y = [100, 90, 80, 60, 60, 55, 60, 65, 70, 70, 75, 76, 78, 79, 90, 99, 99, 100] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) myline = numpy.linspace(1, 22, 100) plt.scatter(x, y) plt.plot(myline, mymodel(myline)) plt.show()
Result:

Example explanation
Import the required modules:
import numpy import matplotlib.pyplot as plt
Create an array representing the values on the x and y axes:
x = [1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 21, 22] y = [100, 90, 80, 60, 60, 55, 60, 65, 70, 70, 75, 76, 78, 79, 90, 99, 99, 100]
NumPy has a method that allows us to establish a polynomial model:
mymodel = numpy.poly1d(numpy.polyfit(x, y, 3))
Then specify the display method of the line, starting from position 1 to position 22:
myline = numpy.linspace(1, 22, 100)
Draw the original scatter plot:
plt.scatter(x, y)
Plot the polynomial regression line:
plt.plot(myline, mymodel(myline))
Display chart:
plt.show()
R-Squared
It is important to know how well the values on the x and y axes are related. If there is no relationship, polynomial regression cannot be used to predict anything.
This relationship is measured by a value called R-squared (r-squared).
The range of the coefficient of determination (R-squared) is 0 to 1, where 0 indicates no correlation, and 1 indicates 100% correlation.
Python and the Sklearn module will calculate this value for you. All you need to do is input the x and y arrays:
Example
How is my data fitted in polynomial regression?
import numpy from sklearn.metrics import r2_score x = [1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 21, 22] y = [100, 90, 80, 60, 60, 55, 60, 65, 70, 70, 75, 76, 78, 79, 90, 99, 99, 100] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) print(r2_score(y, mymodel(x)))
Note:The result of 0.94 indicates a good relationship, and we can use polynomial regression in future predictions.
Predicting future values
Now, we can use the collected information to predict future values.
For example: let's try to predict the speed of cars passing through the toll station around 5:00 PM in the evening:
For this, we need the same as the above example: mymodel Array:
mymodel = numpy.poly1d(numpy.polyfit(x, y, 3))
Example
Predict the speed of the car at 5:00 PM:
import numpy from sklearn.metrics import r2_score x = [1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 21, 22] y = [100, 90, 80, 60, 60, 55, 60, 65, 70, 70, 75, 76, 78, 79, 90, 99, 99, 100] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) speed = mymodel(17) print(speed)
The prediction speed is 88.87 in this example, and we can also see it in the figure:

Poor fit?
Let's create an instance where polynomial regression is not the best method for predicting future values.
Example
These values for the x and y axes will cause the fit of polynomial regression to be very poor:
import numpy import matplotlib.pyplot as plt x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40] y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) myline = numpy.linspace(2, 95, 100) plt.scatter(x, y) plt.plot(myline, mymodel(myline)) plt.show()
Result:

What about the r-squared value?
Example
You should get a very low r-squared value.
import numpy from sklearn.metrics import r2_score x = [89,43,36,36,95,10,66,34,38,20,26,29,48,64,6,5,36,66,72,40] y = [21,46,3,35,67,95,53,72,58,10,26,34,90,33,38,20,56,2,47,15] mymodel = numpy.poly1d(numpy.polyfit(x, y, 3)) print(r2_score(y, mymodel(x)))
Result: 0.00995 indicates a very poor relationship and tells us that the dataset is not suitable for polynomial regression.
- Previous Page Linear Regression
- Next Page Multiple Regression