Skip to main content

Introduction to ML

A Beginner's Guide to Machine Learning

Machine Learning is a fascinating field of AI that empowers computers to learn from data and improve their performance over time without being human intervention.

Machine Learning is a subfield of Artificial Intelligence(AI) that involves training machines or computers to learn from data and make decisions or predictions without being explicitly programmed.

💡
It's about giving computers the ability to learn and improve on their own.

How Does a Machine Learn?

Machine Learning algorithms are trained on large datasets, identify patterns, and evaluate and predict the new data.

It revolves around training a model using data and then using the model to make predictions or decisions.

Dataset

So, what is a dataset? A dataset in Machine Learning (ML) is a collection of structured (like spreadsheet) or unstructured (like text, image, audio, etc) data that serves as the foundation for training, validating, and testing ML models.

A dataset usually contains features (inputs) and labels (outputs) for supervised learning or just features for unsupervised learning.

Dataset Format

Datasets come in various formats depending on the type of data.

Fig: Dataset formats

Where does this dataset come from ?

The Dataset comes from anywhere as we can manually collect data through surveys, interviews, or experiments and also from the internet, such as -

Open source datasets (publicly available), platforms like Kaggle, UCI Machine Learning Repository, or Google Dataset Search. For example, the Titanic dataset, MNIST dataset.

It can be from company or organizational dataset such as Internal databases, logs, or CRM systems. Examples include Customer purchase history and server logs in a company.

Also, web scraping is one of the most popular data collected from websites using tools like BeautifulSoup or Scrapy.

APIs are another method where data can be pulled from APIs like Twitter API, OpenWeatherMap API, or Google Maps API. Also, IoT Devices and Sensors where data is collected from smart devices or machinery.