Ever wonder what you would look like if you were a girl?
Imaging this. I jump out of bed and look in a mirror. I am a blond!
You ask: "That is what you would look like as a girl?"
I Say: "YES OMG YES YES YES! This is what I've always wanted!
The magic mirror is powered by StarGAN, a unified generative adversarial network for multi-domain image-to-image translation. This post will show you how the model works and how you can build the magic mirror.
Image-to-image translation is to change a particular aspect of a given image to another, e.g., changing the gender of a person from male to female. This task has experienced significant improvements following the introduction of generative adversarial networks (GANs), with results ranging from generating photos from edge maps, changing the seasons of scenery images, and reconstructing photo from Monet's painting.
Given training data from two different domains, these models learn to translate images from one domain to the other in a unidirectional way. For example, one generative model is trained to translate a person with black hair to blond hair. Any single existing GAN model is incapable of translating "backward", like in the previous example from blond to black colored hair. Besides, a single model cannot handle flexible multi-domain image translation tasks. Like a configurable translation of both gender and hair colors. That is where StarGAN stands out, a novel generative adversarial network that learns the mappings among multiple domains using only a single generator and a discriminator, training effectively from images of all domains. Instead of learning a fixed translation (e.g., black-to-blond hair), StarGAN's model takes both image and domain information as inputs and learns to translate the input image into the corresponding domain flexibly.
The pre-trained StarGAN model consists or two networks like other GAN models, generative and discriminative networks. While it is only necessary to have the generative network to build the magic mirror, it is still useful to understand where the complete model comes.
The generative network takes two pieces of information as input, the original RGB image with 256 x 256 resolution, and the target labels to generates a fake image with the same resolution, the discriminative network learns to distinguish between real and fake images and classify the real images to its corresponding domain.
The pre-trained model we are going to use was trained on the CelebA datasets which contain 202,599 face images of celebrities, each annotated with 40 binary attributes, while the researchers selected seven domains using the following attributes: hair color (black, blond, brown), gender (male/female), and age (young/old).
The researchers of StarGAN have published their code on GitHub where our magic mirror project based. I was also my first time dealing with the PyTorch framework, so far it's going well. If you are new to the PyTorch framework like me, you will find it quite easy to get started work with especially with the experience of another deep learning framework like Keras or TensorFlow.
Only the most basic of the PyTorch framework knowledge is required to accomplish the project, like PyTorch tensor, loading predefined model weights etc.
Let's starts by installing the framework. In my case, on Windows 10 which is officially supported by the latest PyTorch.
To enable the magic mirror run in real-time with minimal perceivable lags, accelerate the model execution with your gaming PC's Nvidia graphics card if you have one.
Install CUDA 9 from this link on the Nvidia Developer website.
After that install PyTorch with CUDA 9.0 support following its official website instructions.
When PyTorch and other Python dependencies are installed, we are ready for the code.
To implement a simple real-time face tracking and cropping effect, we are going to use the lightweight CascadeClassifier module from Python's OpenCV library. This module takes a grayscale image transformed from a webcam frame and returns detected faces' bounding boxes information. In case multiple faces are detected in a given frame, we will take the "main" face with the largest computed bound box area.
Since the StarGAN generative network expects images where their pixels values range between -1 to 1 instead of 0 to 255, we are going to have PyTorch's built-in image transform utility to handle the image preprocessing.
from torchvision import transforms as T transform =  transform.append(T.ToTensor()) transform.append(T.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))) transform = T.Compose(transform) # Pre-process the image preprocessed_image = transform(face_img)
The generative network subclasses
# Run the generator to generate the desired image with labels. generated = G(preprocessed_image.unsqueeze(0).to(device), labels.unsqueeze(0).to(device))
labels's value will be set to [0, 1, 0, 0, 1].
generated_frame = ((np.moveaxis(generated.cpu().detach().numpy(),, )+1)/2)[:, ::-1, ::-1]
And there is the breakdown,
Wrapping the code into a single function
This tutorial shows you how easy and fun it could be to pick up a new framework like PyTorch and build something interesting with a pre-trained StarGAN network.
The images generated might not look super realistic yet while the StarGAN paper shows a model jointly trained with both the CelebA + RaFD datasets can generate images with fewer artifacts by leveraging both datasets to improve shared low-level tasks such as facial keypoint detection and segmentation. You can follow along with their official GitHub to download both datasets and train such a model as long as you have a beefy machine and