This article was originally published in The New Stack
on April 22, 2020
In part one of this series, we defined data science and explored the role of a data scientist — including data preparation,
modeling, visualization, and discovery. We also introduced the role of a machine learning engineer who closely collaborates with the data scientist.
In this article, we’ll focus on machine learning (ML) and explore how computers can learn from data without explicitly programming or instructing the flow
and logic of learning processes. Or how computers can discover and learn patterns and correlations from any data — no matter where it comes from or what it is
about. A machine can then use these patterns and correlations to perform a specific task, provide predictive/descriptive analytics or autonomously learn
from ever-changing environments. It’s pretty amazing what ML can do!
On a high level, ML uses special algorithms and statistical analysis to create ML models on sample data, usually historical data, referred to as training data.
These ML models are used to make predictions or decisions about new data without being explicitly programmed to do so.
ML can compel a computer to learn from data without programming or instructing the flow and logic of the learning process. It can discover and learn patterns
and correlations from data, no matter where that data originates or what the data is about. It can use those patterns and correlations to perform specific tasks,
provide predictive/descriptive analytics, or autonomously learn from an ever-changing environment.
The main goal of ML is to automate the learning process for computers and eliminate human intervention. The use cases for ML are far-reaching and powerful.
Machine Learning versus Data Science
Data Science is a broader term for multiple disciplines that helps data scientists gather and understand the data from a business perspective.
It also involves cleansing, transforming, and preparing the data for building a predictive or descriptive ML Model.
Machine Learning, however, is a set of algorithms and statistical analysis that allows the creation of ML Models that computers use to learn, without
explicitly programming or instructing the learning process.
In this way, machine learning helps data scientists analyze data and create predictive/descriptive models. In other words, machine learning is a function of data
science and a critical tool used by data scientists in the data processing methodology to solve a particular task or problem.
In this approach, a human expert acts as a teacher and enriches the training data with the correct answers (expected output). The computer learns the patterns based
on the correct answers. This type of training data is also known as labeled data. This approach produces a predictive model.
In this case, a computer learns from the unlabeled data which excludes the correct/desired answers. As you can guess, a teacher is not needed here. This approach
is mainly used to find patterns in the data and discover new meaningful insights.
One of the most reliable ways to improve the performance of your ML model is to train it with as much data as you can get. But mining labeled data can be
a very expensive and time-consuming process. If you have access to a large amount of unlabeled data, then you can use feature learning techniques to enable an
autonomous discovery of new features instead of engineering them manually. In ML, a feature is defined as an individual measurable property or a characteristic
of a phenomenon being observed.
As we’ve seen, ML is a set of algorithms and statistical analysis used to create ML models that allow computers to learn without specific instructions.
We differentiate between supervised, unsupervised, and feature learning, but there are also numerous other learning categories. However, ML still requires
a data scientist for tasks such as manually identifying and creating new features. Your model is only as good as the assumptions on which it is based.
Therefore, you need an expert to ensure it is properly trained.
In our next article, we’ll explore deep learning and AI and how it enables computers to learn without human intervention, reaching a nearly human-like
performance and sometimes even surpassing it.