(Comments)
One challenge of face identification is that when you want to add a new person to the existing list. Do you retrain your network with tons of this new person's face images along with others'? If we build a classification model, how can the model classify an unknown face?
In this demo, we tackle the challenge by computing the similarity of two faces, one in our database, one face image we captured on webcam.
The VGGFace model "encodes" a face into a representation of 2048 numbers.
We then compute the Euclidean distance between two "encoded" faces. If they are the same person, the distance value will be low, if they are from two different persons, the value will be high.
During the face identification time, if the value is below a threshold, we would predict that those two pictures are the same person.
The model itself is based on RESNET50 architecture, which is popular in processing image data.
Let's first take a look at the demo.
The demo source code contains two files. The first file will precompute the "encoded" faces' features and save the results alongside with the persons' names.
The second will be the live demo to capture frames of images from a webcam and identify if any known faces.
Let's jump into it.
One standard way to add a new person to the model is to call the one-shot learning. In the one-shot learning problem, you have to learn from just one example to recognize the person again.
I might be risky since this one photo could be badly lighted or the pose of the face is really bad. So my approach is by extracting faces for from a short video clip taking only contains this person and calculate the "mean features" by averaging all computed features for each image.
You can find my full source precompute_features.py. But here is the important parts that make the magic happen.
We have one or more video files for each person. FaceExtractor
's extract_faces
method takes a video file, read it frame by frame.
For each frame, it crops the face area, then saves the face to an image file into the save_folder
.
FACE_IMAGES_FOLDER = "./data/face_images"
VIDEOS_FOLDER = "./data/videos"
extractor = FaceExtractor()
folders = list(glob.iglob(os.path.join(VIDEOS_FOLDER, '*')))
os.makedirs(FACE_IMAGES_FOLDER, exist_ok=True)
names = [os.path.basename(folder) for folder in folders]
for i, folder in enumerate(folders):
name = names[i]
videos = list(glob.iglob(os.path.join(folder, '*.*')))
save_folder = os.path.join(FACE_IMAGES_FOLDER, name)
print(save_folder)
os.makedirs(save_folder, exist_ok=True)
for video in videos:
extractor.extract_faces(video, save_folder)
In the extract_faces
method, we call the VGGFace feature extractor to generate face features like this,
from keras_vggface.vggface import VGGFace
resnet50_features = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3),
pooling='avg') # pooling: None, avg or max
# images is a numpy array with shape (N, 224, 224, 3)
features = resnet50_features.predict(images)
# features is a numpy array with shape (N, 2048)
We do this for all people's videos. Then we extract features for those face images and calculate the "mean face features" for each person. Then save it to file for the demo part.
precompute_features = []
for i, folder in enumerate(folders):
name = names[i]
save_folder = os.path.join(FACE_IMAGES_FOLDER, name)
mean_features = cal_mean_feature(image_folder=save_folder)
precompute_features.append({"name": name, "features": mean_features})
pickle_stuff("./data/precompute_features.pickle", precompute_features)
Since we have already pre-computed the face features of each person in the live demo part. It only needs to load up the features file we just saved.
Extract the faces, compute the features, compare them with our precomputed features to find if any matches. If we found any matching face, we draw the person's name in the frame overlay.
The method below takes the features computed from a face in webcam image and compare with each of our known faces' features
def identify_face(self, features, threshold=100):
distances = []
for person in self.precompute_features_map:
person_features = person.get("features")
distance = sp.spatial.distance.euclidean(person_features, features)
distances.append(distance)
min_distance_value = min(distances)
min_distance_index = distances.index(min_distance_value)
if min_distance_value < threshold:
return self.precompute_features_map[min_distance_index].get("name")
else:
return "?"
If the person's face feature is "far away" from all of our known face features, we show the "?" sign on the final image overlay to indicate this is an unknown face.
A demo shown below,
I have only included 3 people in this demo. As you can imagining, as the number of people grows, the model will likely to confuse with two similar faces.
If that happens, you could consider exploring the Siamese Network with Triplet Loss as shown in the Coursera course.
FaceNet is a good example.
For those interested. The full source code is listed in my GitHub repo. Enjoy!
Share on Twitter Share on Facebook
Comments