A Beginner’s Guide to Training a YOLOv5 Object Detection Model

Published in

Artificial Intelligence in Plain English

12 min readFeb 24, 2023

Object detection is a fascinating field that has gained a lot of attention in recent years due to its wide range of applications in areas such as self-driving cars, security systems, and healthcare. YOLOv5 (You Only Look Once version 5) is a state-of-the-art object detection model that has become increasingly popular due to its high accuracy and fast processing speed. In this beginner’s guide, we will explore the steps involved in training a YOLOv5 model from scratch, including data preparation, model configuration, and training. Whether you’re a beginner in deep learning or an experienced practitioner, this guide will provide you with a solid foundation to train your own YOLOv5 model and explore the exciting field of object detection.

In order to train a YOLOv5 object detection model, you must have a dataset of annotated images that indicates the location of the objects you want the model to detect. While open-source datasets like COCO and Pascal VOC are available, a custom dataset can be more precise for your specific use case. Dividing the dataset into training and validation sets is also a crucial step to ensure the accuracy and reliability of your model.

This article aims to guide you through the entire process of training and evaluating your own YOLOv5 model. We will provide an introduction to the capabilities of YOLOv5 and discuss how to annotate and prepare your dataset for training. We will then delve into the important aspect of model training, providing you with step-by-step instructions to train your YOLOv5 model from scratch. Lastly, we will explain how to evaluate your model’s performance, giving you all the necessary knowledge to get started with object detection using YOLOv5. Whether you are new to object detection or looking to improve your skills, this comprehensive guide will equip you with the tools needed to train a YOLOv5 object detection model with confidence.

Introduction

YOLOv5 is the latest and greatest version of the YOLO (You Only Look Once) family of real-time object detection models. Developed by Glenn Jocher and the team at Ultralytics, YOLOv5 represents a complete overhaul from its predecessors, YOLOv4 and YOLOv3, and boasts significant improvements in both speed and accuracy. In fact, YOLOv5 has achieved state-of-the-art results in object detection across a range of datasets, including the well-known COCO and Pascal VOC benchmarks.

So what makes YOLOv5 so special? Its architecture! YOLOv5 leverages a novel backbone network called CSPDarknet, which uses cross-stage partial connections to improve information flow and feature reuse. Plus, it integrates cutting-edge techniques like Spatial Attention Modules and Swish activation functions to enhance object detection accuracy even further. All this while maintaining real-time inference speeds of up to 140 frames per second on a single GPU, making it one of the fastest and most accurate object detection models out there. If you’re looking to develop applications in fields like autonomous vehicles, robotics, or surveillance systems, YOLOv5 is definitely worth a closer look.

Dataset and Annotation Tool

The foundation of any successful object detection project is a well-curated dataset. The dataset provides the model with the necessary information to recognize and localize objects accurately. Therefore, before diving into model training, it is crucial to gather or create a good dataset that covers a wide range of scenarios and variations that you anticipate the model to encounter in the real world.

Custom datasets play a critical role in achieving high accuracy and specificity in object detection tasks. While several open-source datasets are available, such as the COCO dataset and EgoHands dataset, creating a custom dataset that caters to your specific use case can be more effective. Annotation tools such as LabelImg and Roboflow can help in creating such a dataset. In my previous article titled “Building Accurate Object Detection Models with RetinaNet: A Comprehensive Step-by-Step Guide”, I utilized LabelImg for image annotation. However, in this article, I will be demonstrating how to use Roboflow, a powerful annotation tool that can streamline the dataset preparation process.

Roboflow is an all-in-one data management platform designed for computer vision tasks, such as object detection, image classification, and segmentation. It offers a range of tools for image annotation, dataset generation, and model deployment. With its integrations with popular deep learning frameworks like TensorFlow and PyTorch, Roboflow makes it easy to train models using your custom dataset.

One of the standout features of Roboflow is its intuitive web-based image annotation tool. The tool allows you to label images with bounding boxes, polygons, and other shapes that indicate the location of objects in the image. Its user-friendly interface and efficiency make it easy to label large datasets quickly and accurately. Additionally, Roboflow offers various features for managing datasets, including data augmentation, data cleaning, and data export. It also provides pre-processing options like resizing and normalization, which can improve model performance.

Overall, Roboflow is an excellent tool for preparing and managing custom datasets for computer vision tasks. Its comprehensive features and user-friendly interface make it ideal for both beginners and experienced practitioners. To start using Roboflow, create an account and set up a new workspace to begin creating new projects.

After creating a new project in Roboflow, you can begin uploading the images you want to annotate. The process is simple and easy to follow. You can upload images directly from your computer or from cloud storage services such as Google Drive or Dropbox. Once the images are uploaded, you can begin annotating them using the platform’s powerful annotation tools. An example of the upload process can be seen in the image below:

After uploading your images to Roboflow, the next step is to assign them for annotation. Annotation is a crucial process in creating a custom dataset, and Roboflow makes it easy with its intuitive interface. Simply draw bounding boxes around the objects of interest, and the tool will automatically save the annotations. Once you have annotated your images, a dialog box will appear, allowing you to split the images into training, validation, and test sets. It is recommended to have at least 60–70% of your data as training data, as this provides the model with enough examples to learn from. However, the split can be adjusted to meet your specific needs. With Roboflow’s annotation tool and easy-to-use data management features, creating a custom dataset for object detection tasks has never been easier.

To demonstrate the process of uploading images for annotation, I have randomly selected 8 images. These images are used as examples to illustrate the annotation process in Roboflow.

After completing the annotation process, you can create a dataset by dividing the annotated images into train, validation, and test sets:

After adding images to the dataset and annotating them, the next step is to generate a new version of the dataset for exporting to your local IDE or Google Colab. During the generation process, you can choose the preprocessing and augmentation steps as per your requirements. For instance, you can resize the images or apply data augmentation techniques like rotation or flipping to increase the diversity of the dataset. Once the generation process is complete, you can easily export your dataset and use it for training your object detection model.

After generating a new version of the dataset, the next step is to export it in a suitable format for training the model. Roboflow offers the flexibility to export the dataset in various formats, including YOLOv5 Pytorch format, which we will use in this example. This format is widely used for training object detection models and is compatible with popular deep learning frameworks such as PyTorch. Once you have selected the desired export format, Roboflow will generate a download link for the exported dataset. You can then use this link to download the dataset to your local IDE or Google Colab for further processing and training.

It’s crucial to save the download link that is generated after exporting your dataset in a text file for easy access, especially if you plan on using it in a Google Colab notebook. Additionally, an API key is generated along with the download link, which should be kept in a secure location to prevent unauthorized access. Make sure not to disclose this key to anyone.

Now that we have prepared our dataset, it’s time to dive into the exciting part of the object detection process: training the model. This step is where the algorithm learns to identify and classify objects in images, allowing it to make accurate predictions on unseen data. So let’s get started!

Model Training

In this article, we will be using Colab to train the YOLOv5 object detection model on our custom dataset. Google Colab is a powerful and user-friendly platform for training deep learning models. The first step to getting started with YOLOv5 on Colab is to clone the YOLOv5 GitHub repository. This repository contains all the necessary code and configuration files required for training the model. Cloning the repository can be done by running a simple command in the Colab notebook:

!git clone https://github.com/ultralytics/yolov5

After cloning the YOLOv5 GitHub repository, we need to set up the environment and configure the model for training. This involves several steps such as installing the required dependencies, configuring the paths for the dataset and the model configuration files, and specifying the training parameters such as batch size, learning rate, and number of epochs.

To install the required dependencies, we need to switch into the YOLOv5 directory using the %cd yolov5 command in the Colab notebook. To confirm that we are in the correct directory, we can import osmodule and run the command print(os.getcwd())to print the current working directory. This ensures that we are in the right place to run the training script for our YOLOv5 model.

%cd yolov5
import os
print(os.getcwd())

Once you are in the right directory, we can run %pip install -qr requirements.txt to install the dependencies listed in the requirements.txt file. We can also install Roboflow, which is required to access our annotated dataset, using %pip install -q roboflow. These commands will set up the necessary environment for training our YOLOv5 model.

%cd yolov5
%pip install -qr requirements.txt
%pip install -q roboflow

After setting up the environment, the next step is to import Roboflow and access the dataset that was created in Roboflow. To do this, you can copy the code that was generated and saved in a text file during the dataset export process into your Colab notebook. Once you have pasted the code, you can run it to download the dataset and load it into your Colab environment. This will allow you to start training your YOLOv5 model using the annotated data.

!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="add-your-api-key")
project = rf.workspace("vit-bf5j3").project("hand-detection-kyj5o")
dataset = project.version(1).download("yolov5")

To start training, we can use the following command:

!python train.py --img 640 --batch 8 --epochs 100 --data {dataset.location}/data.yaml --weights yolov5s.pt --cache

In this command, we are specifying the following parameters:

--img: The size of the input images during training. Here, we are setting it to 640x640 pixels.
--batch: The batch size for training. Here, we are setting it to 8.
--epochs: The number of epochs to train for. Here, we are setting it to 100.
--data: The location of the dataset YAML file which we created in Roboflow.
--weights: The location of the pre-trained YOLOv5 model weights. Here, we are using the yolov5s.pt file.
--cache: Whether or not to cache images for faster training.

Once we run this command, the training process will begin and we can monitor the progress in the Colab notebook. During the training process, YOLOv5 saves two types of checkpoint files: last.pt and best.pt.

last.pt is the latest saved checkpoint of the model. It is updated after each training epoch and contains the weights of the model at that point in the training process. This checkpoint can be used to resume training from where it was left off or to evaluate the model's performance at a certain epoch.

best.pt is the checkpoint that has the best validation loss so far. It is updated whenever the validation loss improves and contains the weights of the model at that point. This checkpoint is useful for model selection as it represents the point in the training process where the model performed the best on the validation set. Both last.pt and best.pt can be used for inference after the training process is complete.

The .pt file can be downloaded from Google Colab to your local machine using the following command in your Colab notebook:

from google.colab import files
files.download('/content/yolov5/runs/train/exp/weights/best.pt')

This command will download the best.pt file, which contains the weights of the best-performing model during training. It is important to save this file in a secure location, as it represents the trained model and can be used to make predictions on new images or to continue training the model later.

It is not necessary to convert a trained YOLOv5 model in .ptformat to .h5format, as the .pt format is already compatible with PyTorch and can be used for inference or further training. However, if you wish to convert the model to .h5format, you can use the following steps:

Install the tensorflow and tensorflow_addons libraries in your Python environment.

!pip install tensorflow tensorflow_addons

2. Load the .pt model using PyTorch and export it in ONNX format.

import torch

# Load the trained model from the .pt file
model = torch.hub.load('ultralytics/yolov5', 'custom', path_or_model='path/to/trained.pt')

# Export the model to ONNX format
torch.onnx.export(model, torch.randn(1, 3, 640, 640), 'yolov5.onnx', opset_version=11)

3. Convert the ONNX model to .h5 format using the TensorFlow converter.

import tensorflow as tf
import tensorflow_addons as tfa

# Load the ONNX model
model = onnx.load('yolov5.onnx')

# Convert the model to TensorFlow format
tf_rep = onnx_tf.backend.prepare(model)

# Convert the TensorFlow model to .h5 format
tf.keras.models.save_model(tf_rep, 'yolov5.h5', save_format='h5')

Note that the conversion from PyTorch to ONNX format may result in some loss of precision, and the conversion from ONNX to TensorFlow format may result in some differences in behavior due to differences in the implementations of certain operations. Therefore, it is generally recommended to use the .pt format directly if possible.

Model Evaluation

To evaluate the performance of the trained model, you can use TensorBoard, which is a powerful tool for visualizing and analyzing machine learning experiments. To use TensorBoard in Colab, you can load the TensorBoard extension using the %load_ext tensorboard command, and then run %tensorboard --logdir runs to start the TensorBoard server. This will launch a TensorBoard dashboard in a new tab, where you can view various metrics and visualizations related to the training process, such as loss curves, accuracy, precision, and recall. You can also compare the performance of different models and experiments using TensorBoard.

%load_ext tensorboard
%tensorboard --logdir runs

In conclusion, building an object detection model using YOLOv5 and Roboflow can be a relatively simple and efficient process. With the help of Roboflow, the time-consuming task of annotating images can be greatly reduced, allowing for more time to focus on training and fine-tuning the model. Additionally, Google Colab provides a convenient and accessible platform for training and evaluating the model. By following the steps outlined in this article, you can build an object detection model that can accurately detect objects in images and videos, and even deploy it for use in real-world applications.

The YOLOv5 model offers a great balance between accuracy and speed, making it a popular choice for object detection tasks. However, it’s important to note that creating a high-quality dataset and tweaking the training parameters can significantly impact the model’s accuracy. With that in mind, get started with building your own object detection model today and see the exciting results!

More content at PlainEnglish.io.

Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.

Build awareness and adoption for your tech startup with Circuit.

A Beginner’s Guide to Training a YOLOv5 Object Detection Model

Introduction

Dataset and Annotation Tool

Model Training

Model Evaluation

Written by Prithvi Seshadri