A Beginner's Guide to Machine Learning
Machine Learning is a fascinating field of AI that empowers computers to learn from data and improve their performance over time without being human intervention.
Machine Learning is a subfield of Artificial Intelligence(AI) that involves training machines or computers to learn from data and make decisions or predictions without being explicitly programmed.
How does a Machine Learn?
Machine Learning algorithms are trained on large datasets, identify patterns, evaluate and predict the new data.
It revolves around training a model using data then using the model to make predictions or decisions.
Dataset
So what is a dataset? A dataset in Machine Learning (ML) is a collection of structured (like spreadsheet) or unstructured (like text, image, audio etc) data that serves as the foundation for training, validating, and testing ML models.
Dataset usually contains features (inputs) and labels (outputs) for supervised learning, or just features for unsupervised learning.
Dataset Format
Datasets come in various formats depending on the type of data.
Where does this dataset come from ?
Dataset comes from anywhere as we can manually collect data through surveys, interviews, or experiments and also from internet such as -
Open source datasets (publicly available), platforms like Kaggle, UCI Machine Learning Repository, or Google Dataset Search. For example, Titanic dataset, MNIST dataset.
It can be from company or organizational dataset such as Internal databases, logs, or CRM systems. Examples like Customer purchase history, server logs in a company.
Also web scraping is one of the most popular data collected from websites using tools like BeautifulSoup or Scrapy.
APIs is another method where data can be pulled from APIs like Twitter API, OpenWeatherMap API, or Google Maps API. Also IoT Devices and Sensors where data is collected from smart devices or machinery.