๐ LFW (Labeled Faces in the Wild): Face Recognition in the Real World
The Labeled Faces in the Wild (LFW) dataset is a well-known benchmark for face recognition and face verification in unconstrained environments. Created by researchers at the University of Massachusetts Amherst, it was among the first large-scale datasets that captured faces in everyday, "in-the-wild" scenarios—far from studio-controlled settings.
๐ง What is LFW?
LFW contains thousands of face images collected from news articles on the web, representing over 5,700 individuals. The dataset’s main goal is to evaluate algorithms for:
-
✅ Face Verification – Are two faces the same person?
-
๐งญ Face Recognition – Who is the person in the image?
๐ Dataset Overview
Feature | Details |
---|---|
๐ผ️ Total Images | 13,233 face images |
๐ค Individuals | 5,749 people |
๐ฅ People with >1 image | 1,680 |
๐ Source | News websites via Google Image Search |
๐ Image Size | 250×250 pixels (centered, cropped) |
๐ Format | JPEG |
๐ข How is LFW Organized?
There are two formats:
-
LFW Funneled – Images are aligned using a commercial face alignment tool for easier benchmarking.
-
LFW Raw – Original cropped faces without alignment.
-
LFW DeepFunneled – Higher-quality alignment using deep learning.
Each file is named like:
[Person_Name]/[Person_Name]_[Image_Number].jpg
Example:
George_W_Bush/George_W_Bush_0001.jpg
๐ Evaluation Protocols
LFW provides multiple evaluation setups:
1. Face Verification (default)
-
Compares pairs of faces.
-
6,000 face pairs (3,000 matching, 3,000 non-matching).
-
Commonly used to report accuracy.
2. Unrestricted with Labeled Outside Data
-
Allows training with external datasets (like VGGFace or MS-Celeb-1M).
๐งช Face Verification with LFW in Python
Load the Dataset Using sklearn
from sklearn.datasets import fetch_lfw_people
import matplotlib.pyplot as plt
lfw = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
print("Images shape:", lfw.images.shape)
print("Target names:", lfw.target_names)
# Show some sample faces
fig, axes = plt.subplots(1, 5, figsize=(12, 4))
for i, ax in enumerate(axes):
ax.imshow(lfw.images[i], cmap='gray')
ax.set_title(lfw.target_names[lfw.target[i]])
ax.axis('off')
๐ง Models Trained or Evaluated on LFW
LFW has been used as a benchmark for many face recognition models:
Model | Accuracy (%) | Year |
---|---|---|
Eigenfaces + PCA | ~60% | 2003 |
LBP (Local Binary Pattern) | ~78% | 2007 |
DeepFace (Facebook) | 97.35% | 2014 |
FaceNet (Google) | 99.63% | 2015 |
ArcFace (InsightFace) | 99.83%+ | 2019 |
Many of these models use embedding-based architectures and triplet loss or angular margin loss.
๐ Resources
๐งต Summary
Feature | Value |
---|---|
Total Images | 13,233 |
Unique People | 5,749 |
Faces per Person (min) | 1 (1,680 people have ≥2 images) |
Evaluation | Face verification (6,000 pairs) |
Focus | Real-world face recognition |
The LFW dataset was a game-changer for face recognition research. Even though newer and larger datasets like VGGFace2, MS-Celeb-1M, and CASIA-WebFace now dominate, LFW remains a lightweight, reliable benchmark—perfect for testing models and learning the basics of facial recognition.