Machine Learning - Multivariate Regression
- Previous Page Polynomial Regression
- Next Page Zoom
Multiple Regression (Multiple Regression)
Multiple regression is like linear regression, but with multiple independent values, which means we try to predict a value based on two or more variables.
Please see the following dataset, which contains some information about cars.
Car | Model | Volume | Weight | CO2 |
---|---|---|---|---|
Toyota | Aygo | 1000 | 790 | 99 |
Mitsubishi | Space Star | 1200 | 1160 | 95 |
Skoda | Citigo | 1000 | 929 | 95 |
Fiat | 500 | 900 | 865 | 90 |
Mini | Cooper | 1500 | 1140 | 105 |
VW | Up! | 1000 | 929 | 105 |
Skoda | Fabia | 1400 | 1109 | 90 |
Mercedes | A-Class | 1500 | 1365 | 92 |
Ford | Fiesta | 1500 | 1112 | 98 |
Audi | A1 | 1600 | 1150 | 99 |
Hyundai | I20 | 1100 | 980 | 99 |
Suzuki | Swift | 1300 | 990 | 101 |
Ford | Fiesta | 1000 | 1112 | 99 |
Honda | Civic | 1600 | 1252 | 94 |
Hundai | I30 | 1600 | 1326 | 97 |
Opel | Astra | 1600 | 1330 | 97 |
BMW | 1 | 1600 | 1365 | 99 |
Mazda | 3 | 2200 | 1280 | 104 |
Skoda | Rapid | 1600 | 1119 | 104 |
Ford | Focus | 2000 | 1328 | 105 |
Ford | Mondeo | 1600 | 1584 | 94 |
Opel | Insignia | 2000 | 1428 | 99 |
Mercedes | C-Class | 2100 | 1365 | 99 |
Skoda | Octavia | 1600 | 1415 | 99 |
Volvo | S60 | 2000 | 1415 | 99 |
Mercedes | CLA | 1500 | 1465 | 102 |
Audi | A4 | 2000 | 1490 | 104 |
Audi | A6 | 2000 | 1725 | 114 |
Volvo | V70 | 1600 | 1523 | 109 |
BMW | 5 | 2000 | 1705 | 114 |
Mercedes | E-Class | 2100 | 1605 | 115 |
Volvo | XC70 | 2000 | 1746 | 117 |
Ford | B-Max | 1600 | 1235 | 104 |
BMW | 2 | 1600 | 1390 | 108 |
Opel | Zafira | 1600 | 1405 | 109 |
Mercedes | SLK | 2500 | 1395 | 120 |
We can predict the CO2 emissions of a car based on the size of the engine displacement, but by using multiple regression, we can introduce more variables, such as the weight of the car, to make the prediction more accurate.
Working principle
In Python, we have modules that can do this job. First, import the Pandas module:
import pandas
The Pandas module allows us to read csv files and return a DataFrame object.
This file is for testing purposes only, and you can download it here:cars.csv
df = pandas.read_csv("cars.csv")
Then list the independent values, and name this variable X.
Put the related values into a variable named y.
X = df[['Weight', 'Volume']] y = df['CO2']
Tip:Generally, the list of independent values is named in uppercase X
and name the list of related values in lowercase y
.
We will use some methods from the sklearn module, so we must also import this module:
from sklearn import linear_model
In the sklearn module, we will use LinearRegression()
to create a linear regression object.
The object has a method named fit()
The method, which takes independent and dependent values as parameters, and fills the regression object with data describing this relationship:
regr = linear_model.LinearRegression() regr.fit(X, y)
Now, we have a regression object that can predict CO2 values based on the weight and displacement of the car:
# Predict the CO2 emissions of a car with a weight of 2300kg and displacement of 1300ccm: predictedCO2 = regr.predict([[2300, 1300]])
Example
Please see the complete example:
import pandas from sklearn import linear_model df = pandas.read_csv("cars.csv") X = df[['Weight', 'Volume']] y = df['CO2'] regr = linear_model.LinearRegression() regr.fit(X, y) # Predict the CO2 emissions of a car with a weight of 2300kg and displacement of 1300ccm: predictedCO2 = regr.predict([[2300, 1300]]) print(predictedCO2)
Result:
[107.2087328]
We predict that a car equipped with a 1.3-liter engine and a weight of 2300 kilograms will emit about 107 grams of carbon dioxide per kilometer driven.
Coefficient
The coefficient is the factor that describes the relationship with the unknown variable.
For example: if x
If the variable is 2x
Is x
twice.x
Is the unknown variable, the number 2
Is the coefficient.
In this case, we can ask for the coefficient values of weight relative to CO2 and volume relative to CO2. The answer we get tells us what will happen if we increase or decrease one of the independent values.
Example
Print the coefficient values of the regression object:
import pandas from sklearn import linear_model df = pandas.read_csv("cars.csv") X = df[['Weight', 'Volume']] y = df['CO2'] regr = linear_model.LinearRegression() regr.fit(X, y) print(regr.coef_)
Result:
[0.00755095 0.00780526]
Result interpretation
The result array represents the coefficient values of weight and displacement.
Weight: 0.00755095 Volume: 0.00780526
These values tell us that if the weight increases by 1g, the CO2 emissions will increase by 0.00755095g.
If the engine size (volume) increases by 1 ccm, the CO2 emissions will increase by 0.00780526g.
I think this is a reasonable guess, but please do the test anyway!
We have predicted that if a car equipped with a 1300ccm engine weighs 2300 kilograms, the carbon dioxide emissions will be about 107 grams.
What if we increase the weight by 1000g?
Example
Copy the previous example but change the car weight from 2300 to 3300:
import pandas from sklearn import linear_model df = pandas.read_csv("cars.csv") X = df[['Weight', 'Volume']] y = df['CO2'] regr = linear_model.LinearRegression() regr.fit(X, y) predictedCO2 = regr.predict([[3300, 1300]]) print(predictedCO2)
Result:
[114.75968007]
We have predicted that a car equipped with a 1.3-liter engine and weighing 3.3 tons will emit about 115 grams of carbon dioxide for every kilometer traveled.
This indicates that the coefficient of 0.00755095 is correct:
107.2087328 + (1000 * 0.00755095) = 114.75968
- Previous Page Polynomial Regression
- Next Page Zoom