Machine Learning - Scatter Plot
- Previous Page Normal Data Distribution
- Next Page Linear Regression
Scatter Plot (Scatter Plot)
A scatter plot is a graph where each value in the dataset is represented by a point.

Matplotlib has a method for drawing scatter plots, which requires two arrays of the same length, one for the x-axis values and the other for the y-axis values:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6] y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
The x array represents the age of each car.
The y array represents the speed of each car.
Example
Please use scatter()
Method to draw a scatter plot:
import matplotlib.pyplot as plt x = [5,7,8,7,2,17,2,9,4,11,12,9,6] y = [99,86,87,88,111,86,103,87,94,78,77,85,86] plt.scatter(x, y) plt.show()
Result:

Scatter Plot Explanation
The x-axis represents the age of the car, and the y-axis represents the speed.
As can be seen from the figure, the two fastest cars have been used for 2 years, and the slowest car has been used for 12 years.
Note:It seems that the faster the driving speed, the newer the car, but this may be a coincidence, after all, we only registered 13 cars.
Random Data Distribution
In machine learning, datasets can contain thousands, even millions, of values.
When testing algorithms, you may not have real data, and you may have to use randomly generated values.
As we learned in the previous chapter, the NumPy module can help us!
Let's create two arrays, both filled with 1000 random numbers from a normal data distribution.
The mean of the first array is set to 5.0, with a standard deviation of 1.0.
The mean of the second array is set to 10.0, with a standard deviation of 2.0:
Example
Scatter plot with 1000 points:
import numpy import matplotlib.pyplot as plt x = numpy.random.normal(5.0, 1.0, 1000) y = numpy.random.normal(10.0, 2.0, 1000) plt.scatter(x, y) plt.show()
Result:

Scatter Plot Explanation
We can see that the points are concentrated around the value 5 on the x-axis and 10 on the y-axis.
We can also see that the dispersion on the y-axis is greater than that on the x-axis.
- Previous Page Normal Data Distribution
- Next Page Linear Regression