Machine Learning: Must Do Projects for Beginners

Popular Datasets in Machine Learning, Popular Models in Machine Learning

Published in

Artificial Intelligence in Plain English

4 min readFeb 5, 2023

Machine learning is a branch of artificial intelligence that makes predictions or decisions by using algorithms to learn from data. It’s crucial to have a solid understanding of both the datasets you will be dealing with and the models you can use to construct your models before you begin using machine learning. We will examine some of the most well-liked machine learning datasets and models in this article. In the next sections, we will discuss several datasets and the models that may be used in that exercises.

Iris Flowers

This well-known dataset includes measurements of the sepal length and width, petal length and width, and 150 iris flowers — 50 flowers from each of the three species. For classification and clustering issues, it is frequently utilized. The goal is to classify the flowers into one of three species. The models you should consider using are K-nearest neighbors (KNN), decision trees , Naive Bayes, neural networks, and support vector machines (SVMs), and the dataset can be found at the following link.

Iris Flower Dataset

Iris flower data set used for multi-class classification.

www.kaggle.com

If you want to learn more about K-nearest neighbors (KNN), you can do so in the following article.

K-Nearest Neighbors (KNN) in depth

K-NN, machine learning, classification

medium.com

If you want to learn more about Support Vector Machines (SVM), follow the link bellow.

Support Vector Machines (SVM) in depth (part 1)

Functional Margin, Hinge Loss, Dual Problem, Lagrange Multipliers

medium.com

If you want to learn more about Bayes Classifiers, follow the link bellow.

Bayes Classifiers in depth

Bayes Theorem, Hypothesis, Probability, Machine Learning

ai.plainenglish.io

If you want to learn more about Neural Networks, follow the link bellow.

The Perceptron: A Foundational Building Block of Neural Networks

Neuron, Linear Binary Classifier, Deep Learning, Algebra

ai.plainenglish.io

Or, more broadly, algorithm decision trees.

Understanding Decision Trees: A Mathematical Perspective (part 1)

Information Entropy, Information Gain, Nodes, Tree

ai.plainenglish.io

Titanic Survival

This dataset includes specific details about passengers who boarded the Titanic, including their age, the fare they paid, and whether or not they survived. The goal is to classify whether they survived or not. The models you should consider using are Logistic Regression, Decision Trees, Random Forests (to be posted), and Neural Networks. The dataset can be found at the following link.

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

www.kaggle.com

If you want to learn more about Logistic Regression, you can check out this article.

Logistic regression in depth

Logistic regression, activation function, derivation, math

medium.com

Boston Housing

A dataset including details on Boston-area house prices, including the typical number of rooms per property, the level of crime, and the age of the building. The goal is to predict the house price. The models you should consider using are Linear Regression, Decision Trees, Random Forests, and Neural Networks. The dataset can be found at the following link.

The Boston Housing Dataset

Explore and run machine learning code with Kaggle Notebooks | Using data from Boston House Prices

www.kaggle.com

If you want to learn more about Linear Regression, you can check out this article.

Linear Regression in depth

The directive equation of a straight line, simple linear regression, math, cost functions

medium.com

Adult Income

A dataset containing information on the income and demographic characteristics of individuals, commonly used for binary classification problems. The goal is to classify whether a person has an income of more than 50 thousand or less than and equal to 50 thousand. The models you should consider using are Logistic Regression, Decision Trees, Random Forests, and Neural Networks. The dataset can be found at the following link.

Adult income dataset

A widely cited KNN dataset as a playground

www.kaggle.com

Digits

A dataset of images of handwritten digits, commonly used for image classification problems. The goal is to correctly classify the displayed picture numbers. The models you should consider using are K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Neural Networks, and Convolutional Neural Networks (CNNs) (to be posted). The dataset can be found at the following link.

Digit Recognizer

Learn computer vision fundamentals with the famous MNIST data

www.kaggle.com

CIFAR-10

A dataset of small images in 10 classes, often used for image classification and computer vision problems. The goal is to correctly classify the displayed picture of many objects. The models you should consider using are Neural Networks, Convolutional Neural Networks (CNNs), and Random Forests. The dataset can be found at the following link.

CIFAR-10 - Object Recognition in Images

Identify the subject of 60,000 labeled images

www.kaggle.com

If you like the article, be sure to follow me to catch my new articles.

More content at PlainEnglish.io.

Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.

Machine Learning: Must Do Projects for Beginners

Popular Datasets in Machine Learning, Popular Models in Machine Learning

Iris Flowers

Iris Flower Dataset

Iris flower data set used for multi-class classification.

K-Nearest Neighbors (KNN) in depth

K-NN, machine learning, classification

Support Vector Machines (SVM) in depth (part 1)

Functional Margin, Hinge Loss, Dual Problem, Lagrange Multipliers

Bayes Classifiers in depth

Bayes Theorem, Hypothesis, Probability, Machine Learning

The Perceptron: A Foundational Building Block of Neural Networks

Neuron, Linear Binary Classifier, Deep Learning, Algebra

Understanding Decision Trees: A Mathematical Perspective (part 1)

Information Entropy, Information Gain, Nodes, Tree

Titanic Survival

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

Logistic regression in depth

Logistic regression, activation function, derivation, math

Boston Housing

The Boston Housing Dataset

Explore and run machine learning code with Kaggle Notebooks | Using data from Boston House Prices

Linear Regression in depth

The directive equation of a straight line, simple linear regression, math, cost functions

Adult Income

Adult income dataset

A widely cited KNN dataset as a playground

Digits

Digit Recognizer

Learn computer vision fundamentals with the famous MNIST data

CIFAR-10

CIFAR-10 - Object Recognition in Images

Identify the subject of 60,000 labeled images

If you like the article, be sure to follow me to catch my new articles.

Written by Tech-AI-Math