Course Schedule

Python Tutorial

File Processing

Python NumPy

Machine Learning

Python MySQL

Python MongoDB

Python Reference Manual

Module Reference Manual

Python How To

Python Example

Elective Courses

Course Recommendations:

CodeW3C.com Treasure Box

Machine Learning - Multivariate Regression

Previous Page Polynomial Regression
Next Page Zoom

Multiple Regression (Multiple Regression)

Multiple regression is like linear regression, but with multiple independent values, which means we try to predict a value based on two or more variables.

Please see the following dataset, which contains some information about cars.

Car	Model	Volume	Weight	CO2
Toyota	Aygo	1000	790	99
Mitsubishi	Space Star	1200	1160	95
Skoda	Citigo	1000	929	95
Fiat	500	900	865	90
Mini	Cooper	1500	1140	105
VW	Up!	1000	929	105
Skoda	Fabia	1400	1109	90
Mercedes	A-Class	1500	1365	92
Ford	Fiesta	1500	1112	98
Audi	A1	1600	1150	99
Hyundai	I20	1100	980	99
Suzuki	Swift	1300	990	101
Ford	Fiesta	1000	1112	99
Honda	Civic	1600	1252	94
Hundai	I30	1600	1326	97
Opel	Astra	1600	1330	97
BMW	1	1600	1365	99
Mazda	3	2200	1280	104
Skoda	Rapid	1600	1119	104
Ford	Focus	2000	1328	105
Ford	Mondeo	1600	1584	94
Opel	Insignia	2000	1428	99
Mercedes	C-Class	2100	1365	99
Skoda	Octavia	1600	1415	99
Volvo	S60	2000	1415	99
Mercedes	CLA	1500	1465	102
Audi	A4	2000	1490	104
Audi	A6	2000	1725	114
Volvo	V70	1600	1523	109
BMW	5	2000	1705	114
Mercedes	E-Class	2100	1605	115
Volvo	XC70	2000	1746	117
Ford	B-Max	1600	1235	104
BMW	2	1600	1390	108
Opel	Zafira	1600	1405	109
Mercedes	SLK	2500	1395	120

We can predict the CO2 emissions of a car based on the size of the engine displacement, but by using multiple regression, we can introduce more variables, such as the weight of the car, to make the prediction more accurate.

Working principle

In Python, we have modules that can do this job. First, import the Pandas module:

import pandas

The Pandas module allows us to read csv files and return a DataFrame object.

This file is for testing purposes only, and you can download it here:cars.csv

df = pandas.read_csv("cars.csv")

Then list the independent values, and name this variable X.

Put the related values into a variable named y.

X = df[['Weight', 'Volume']]
y = df['CO2']

Tip:Generally, the list of independent values is named in uppercase Xand name the list of related values in lowercase y.

We will use some methods from the sklearn module, so we must also import this module:

from sklearn import linear_model

In the sklearn module, we will use LinearRegression() to create a linear regression object.

The object has a method named fit() The method, which takes independent and dependent values as parameters, and fills the regression object with data describing this relationship:

regr = linear_model.LinearRegression()
regr.fit(X, y)

Now, we have a regression object that can predict CO2 values based on the weight and displacement of the car:

# Predict the CO2 emissions of a car with a weight of 2300kg and displacement of 1300ccm:
predictedCO2 = regr.predict([[2300, 1300]])

Example

Please see the complete example:

import pandas
from sklearn import linear_model
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
# Predict the CO2 emissions of a car with a weight of 2300kg and displacement of 1300ccm:
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)

Result:

[107.2087328]

Run Instance

We predict that a car equipped with a 1.3-liter engine and a weight of 2300 kilograms will emit about 107 grams of carbon dioxide per kilometer driven.

Coefficient

The coefficient is the factor that describes the relationship with the unknown variable.

For example: if x If the variable is 2x Is x twice.x Is the unknown variable, the number 2 Is the coefficient.

In this case, we can ask for the coefficient values of weight relative to CO2 and volume relative to CO2. The answer we get tells us what will happen if we increase or decrease one of the independent values.

Example

Print the coefficient values of the regression object:

import pandas
from sklearn import linear_model
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(regr.coef_)

Result:

[0.00755095 0.00780526]

Run Instance

Result interpretation

The result array represents the coefficient values of weight and displacement.

Weight: 0.00755095
Volume: 0.00780526

These values tell us that if the weight increases by 1g, the CO2 emissions will increase by 0.00755095g.

If the engine size (volume) increases by 1 ccm, the CO2 emissions will increase by 0.00780526g.

I think this is a reasonable guess, but please do the test anyway!

We have predicted that if a car equipped with a 1300ccm engine weighs 2300 kilograms, the carbon dioxide emissions will be about 107 grams.

What if we increase the weight by 1000g?

Example

Copy the previous example but change the car weight from 2300 to 3300:

import pandas
from sklearn import linear_model
df = pandas.read_csv("cars.csv")
X = df[['Weight', 'Volume']]
y = df['CO2']
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[3300, 1300]])
print(predictedCO2)

Result:

[114.75968007]

Run Instance

We have predicted that a car equipped with a 1.3-liter engine and weighing 3.3 tons will emit about 115 grams of carbon dioxide for every kilometer traveled.

This indicates that the coefficient of 0.00755095 is correct:

107.2087328 + (1000 * 0.00755095) = 114.75968

Previous Page Polynomial Regression
Next Page Zoom

Course Schedule

Python Tutorial

File Processing

Python NumPy