LFW (Labeled Faces in the Wild): Face Recognition in the Real World

ย 

๐Ÿ˜Ž LFW (Labeled Faces in the Wild): Face Recognition in the Real World

The Labeled Faces in the Wild (LFW) dataset is a well-known benchmark for face recognition and face verification in unconstrained environments. Created by researchers at the University of Massachusetts Amherst, it was among the first large-scale datasets that captured faces in everyday, "in-the-wild" scenariosโ€”far from studio-controlled settings.


๐Ÿง  What is LFW?

LFW contains thousands of face images collected from news articles on the web, representing over 5,700 individuals. The datasetโ€™s main goal is to evaluate algorithms for:

  • โœ… Face Verification โ€“ Are two faces the same person?

  • ๐Ÿงญ Face Recognition โ€“ Who is the person in the image?


๐Ÿ“Š Dataset Overview

Feature Details
๐Ÿ–ผ๏ธ Total Images 13,233 face images
๐Ÿ‘ค Individuals 5,749 people
๐Ÿ‘ฅ People with >1 image 1,680
๐ŸŒ Source News websites via Google Image Search
๐Ÿ“ Image Size 250ร—250 pixels (centered, cropped)
๐Ÿ“ Format JPEG

๐Ÿ”ข How is LFW Organized?

There are two formats:

  1. LFW Funneled โ€“ Images are aligned using a commercial face alignment tool for easier benchmarking.

  2. LFW Raw โ€“ Original cropped faces without alignment.

  3. LFW DeepFunneled โ€“ Higher-quality alignment using deep learning.

Each file is named like:

[Person_Name]/[Person_Name]_[Image_Number].jpg

Example:

George_W_Bush/George_W_Bush_0001.jpg

๐Ÿ” Evaluation Protocols

LFW provides multiple evaluation setups:

1. Face Verification (default)

  • Compares pairs of faces.

  • 6,000 face pairs (3,000 matching, 3,000 non-matching).

  • Commonly used to report accuracy.

2. Unrestricted with Labeled Outside Data

  • Allows training with external datasets (like VGGFace or MS-Celeb-1M).


๐Ÿงช Face Verification with LFW in Python

Load the Dataset Using sklearn

from sklearn.datasets import fetch_lfw_people
import matplotlib.pyplot as plt

lfw = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

print("Images shape:", lfw.images.shape)
print("Target names:", lfw.target_names)

# Show some sample faces
fig, axes = plt.subplots(1, 5, figsize=(12, 4))
for i, ax in enumerate(axes):
    ax.imshow(lfw.images[i], cmap='gray')
    ax.set_title(lfw.target_names[lfw.target[i]])
    ax.axis('off')

๐Ÿง  Models Trained or Evaluated on LFW

LFW has been used as a benchmark for many face recognition models:

Model Accuracy (%) Year
Eigenfaces + PCA ~60% 2003
LBP (Local Binary Pattern) ~78% 2007
DeepFace (Facebook) 97.35% 2014
FaceNet (Google) 99.63% 2015
ArcFace (InsightFace) 99.83%+ 2019

Many of these models use embedding-based architectures and triplet loss or angular margin loss.


๐Ÿ”— Resources


๐Ÿงต Summary

Feature Value
Total Images 13,233
Unique People 5,749
Faces per Person (min) 1 (1,680 people have โ‰ฅ2 images)
Evaluation Face verification (6,000 pairs)
Focus Real-world face recognition

The LFW dataset was a game-changer for face recognition research. Even though newer and larger datasets like VGGFace2, MS-Celeb-1M, and CASIA-WebFace now dominate, LFW remains a lightweight, reliable benchmarkโ€”perfect for testing models and learning the basics of facial recognition.

Python

Machine Learning