Search This Blog

LFW (Labeled Faces in the Wild): Face Recognition in the Real World

 

๐Ÿ˜Ž LFW (Labeled Faces in the Wild): Face Recognition in the Real World

The Labeled Faces in the Wild (LFW) dataset is a well-known benchmark for face recognition and face verification in unconstrained environments. Created by researchers at the University of Massachusetts Amherst, it was among the first large-scale datasets that captured faces in everyday, "in-the-wild" scenarios—far from studio-controlled settings.


๐Ÿง  What is LFW?

LFW contains thousands of face images collected from news articles on the web, representing over 5,700 individuals. The dataset’s main goal is to evaluate algorithms for:

  • Face Verification – Are two faces the same person?

  • ๐Ÿงญ Face Recognition – Who is the person in the image?


๐Ÿ“Š Dataset Overview

Feature Details
๐Ÿ–ผ️ Total Images 13,233 face images
๐Ÿ‘ค Individuals 5,749 people
๐Ÿ‘ฅ People with >1 image 1,680
๐ŸŒ Source News websites via Google Image Search
๐Ÿ“ Image Size 250×250 pixels (centered, cropped)
๐Ÿ“ Format JPEG

๐Ÿ”ข How is LFW Organized?

There are two formats:

  1. LFW Funneled – Images are aligned using a commercial face alignment tool for easier benchmarking.

  2. LFW Raw – Original cropped faces without alignment.

  3. LFW DeepFunneled – Higher-quality alignment using deep learning.

Each file is named like:

[Person_Name]/[Person_Name]_[Image_Number].jpg

Example:

George_W_Bush/George_W_Bush_0001.jpg

๐Ÿ” Evaluation Protocols

LFW provides multiple evaluation setups:

1. Face Verification (default)

  • Compares pairs of faces.

  • 6,000 face pairs (3,000 matching, 3,000 non-matching).

  • Commonly used to report accuracy.

2. Unrestricted with Labeled Outside Data

  • Allows training with external datasets (like VGGFace or MS-Celeb-1M).


๐Ÿงช Face Verification with LFW in Python

Load the Dataset Using sklearn

from sklearn.datasets import fetch_lfw_people
import matplotlib.pyplot as plt

lfw = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

print("Images shape:", lfw.images.shape)
print("Target names:", lfw.target_names)

# Show some sample faces
fig, axes = plt.subplots(1, 5, figsize=(12, 4))
for i, ax in enumerate(axes):
    ax.imshow(lfw.images[i], cmap='gray')
    ax.set_title(lfw.target_names[lfw.target[i]])
    ax.axis('off')

๐Ÿง  Models Trained or Evaluated on LFW

LFW has been used as a benchmark for many face recognition models:

Model Accuracy (%) Year
Eigenfaces + PCA ~60% 2003
LBP (Local Binary Pattern) ~78% 2007
DeepFace (Facebook) 97.35% 2014
FaceNet (Google) 99.63% 2015
ArcFace (InsightFace) 99.83%+ 2019

Many of these models use embedding-based architectures and triplet loss or angular margin loss.


๐Ÿ”— Resources


๐Ÿงต Summary

Feature Value
Total Images 13,233
Unique People 5,749
Faces per Person (min) 1 (1,680 people have ≥2 images)
Evaluation Face verification (6,000 pairs)
Focus Real-world face recognition

The LFW dataset was a game-changer for face recognition research. Even though newer and larger datasets like VGGFace2, MS-Celeb-1M, and CASIA-WebFace now dominate, LFW remains a lightweight, reliable benchmark—perfect for testing models and learning the basics of facial recognition.

Popular Posts