Machine Learning - Introduction

Machine learning enables computers to learn from research data and statistical information.

Machine learning is one step towards the direction of artificial intelligence (AI).

Machine learning is a program that can analyze data and learn to predict results.

Where to start?

In this tutorial, we will return to mathematics and study statistics, as well as how to calculate important values based on the dataset.

We will also learn how to use various Python modules to obtain the answers we need.

And, we will learn how to write functions that can predict results based on the knowledge we have learned.

Dataset

In computers, a dataset refers to any collection of data. It can range from an array to a complete database.

An example of an array:

[99,86,87,88,111,86,103,87,94,78,77,85,86]

An example of a database:

Carname Color Age Speed AutoPass
BMW red 5 99 Y
Volvo black 7 86 Y
VW gray 8 87 N
VW white 7 88 Y
Ford white 2 111 Y
VW white 17 86 Y
Tesla red 2 103 Y
BMW black 9 87 Y
Volvo gray 4 94 N
Ford white 11 78 N
Toyota gray 12 77 N
VW white 9 85 N
Toyota blue 6 86 Y

By examining the array, we can guess that the average may be around 80 or 90, and we can also determine the maximum and minimum values, but what else can we do?

By looking at the database, we can see that the most popular color is white, the oldest car age is 17 years, but what if we can predict whether a car has AutoPass just by looking at other values?

This is the purpose of machine learning! Analyze data and predict results!

In machine learning, it is usually very large datasets that are used. In this tutorial, we will try to make it as easy as possible for you to understand different concepts of machine learning and will use some small datasets that are easy to understand.

Data Type

To analyze data, it is very important to understand the data types we are dealing with.

We can divide data types into three main categories:

  • Numeric (Numeric)
  • Categorical (Categorical)
  • Ordinal (Ordinal)

Numeric DataIs a number that can be divided into two numerical categories:

Discrete Data (Discrete Data)
- Limited to integer numbers. Examples: The number of cars passing by.
Continuous Data (Continuous Data)
- Has an infinite number of values. Examples: The price of a product or the size of a product.

Categorical DataAre values that cannot be measured. Examples: Color values or any yes/no values.

Ordinal DataSimilar to categorical data, but can be measured. Examples: School grades that are better than B, and so on.

By understanding the data types of the data source, you can know which technology to use when analyzing data.

In the next chapter, you will learn more about statistics and data analysis.