Hugo Ponte
Training AI with CGI
Updated: Feb 23, 2021
We used only synthetic data to train a computer vision model to identity components on a Raspberry Pi board.
Training with synthetic data is an increasingly popular way to quench the thirst of data hungry deep learning AI models. The data sets used for this project are available for free at app.zumolabs.ai (just create a free account and download it yourself!). We want to make using synthetic data easy for everyone and plan to release more data sets in the future.
The Problem
The Raspberry Pi is a single-board computer very popular with hobbyists. Our goal was to detect some of the sub-components that sit on the board: the pin connectors, the audio jack, and the ethernet port. Though this is a toy problem, it is not far from what you see in the real world — where automating component and defect detection using computer vision can improve the speed and reliability of manufacturing.
The Data
We then use a game engine (such as Unity or Unreal Engine) to take thousands of images of our 3D model from a variety of camera angles and under a variety of lighting conditions. Each image has a corresponding segmentation mask, which sections out the different components in the image. In future articles we will dive deeper into the process of creating synthetic images (so stay tuned!).
Closing Sim2Real Gap
So now that we have thousands of synthetic images, we should be good right? No! It is very important to test synthetically-trained models on real data to know whether the model is successfully generalizing to real data. There is a gap between simulation-produced data and real data known as the sim2real gap. One way to think about it is that deep learning models will overfit on the smallest of details, and if we aren’t careful many of these details may only exist in the synthetic data.
One way we can start closing the sim2real gap is through a technique known as domain randomization [1][2]. This strategy involves randomizing properties of the virtual RPi, especially the visual appearance of the backgrounds and the RPi itself. This has the downstream effect of making the model we train on this data more robust to variations in color and lighting. This is also known as the network’s ability to generalize.

Training the Model
We trained our model with four different synthetic data sets to show how domain randomization and data set size affect performance on our real test data set. We used a model from PyTorch’s torchvision library based on the ResNet architecture [3]. Synthetic data will work with any model architecture, so feel free to experiment and find the one that best fits your use case.
We use mAP (mean average precision) to measure the performance of our computer vision model. As we predicted, the performance of the models increases with the more synthetic data we use. Deep learning models will almost always improve with larger data sets, but, more interestingly, training AI with a domain randomized synthetic data set results in a significant performance boost over our real test data set.
Conclusion
TLDR: we trained a computer vision model to detect sub-components of a Raspberry PI using entirely synthetic data. We used the technique of domain randomization to improve the performance of our model on real images. And, ta-da! Our trained model works on real data despite it never having seen a single real image.
(By Hugo Ponte. Questions, comments, suggestions? Send them his way at: hugo@zumolabs.ai or book a demo here.)
To read this post on Towards Data Science, click here: LINK.
[1] Lilian Weng. “Domain Randomization for Sim2Real Transfer”. (https://lilianweng.github.io/lil-log/2019/05/05/domain-randomization.html).
[2] Josh Tobin, et al. “Domain randomization for transferring deep neural networks from simulation to the real world.” IROS, 2017. (https://arxiv.org/abs/1703.06907).
[3] Torchvision on GitHub. (https://github.com/pytorch/vision).