๐พ Pascal VOC Dataset: A Classic in Computer Vision
The Pascal Visual Object Classes (VOC) dataset is one of the earliest and most influential benchmarks in computer vision, especially for object detection, image classification, segmentation, and person layout tasks. While newer datasets like COCO have taken the spotlight, Pascal VOC remains highly relevant for learning and benchmarking foundational vision models.
๐ฆ What is Pascal VOC?
The Pascal VOC dataset, created as part of the PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning) project, provides a standardized dataset and evaluation protocol for visual object recognition.
The dataset contains real-life images collected from Flickr and annotated with objects belonging to 20 object categories across various tasks.
๐ Key Features
Feature | Description |
---|---|
๐ Years Available | VOC 2007, 2010, 2011, 2012 |
๐ผ️ Total Images | ~11,500 (VOC 2012) |
๐ง Classes | 20 (e.g., person, dog, cat, car, bike) |
๐ Tasks Supported | Classification, Detection, Segmentation, Person Layout |
๐ Format | XML annotation per image (Pascal VOC format) |
๐ท️ Object Categories
Pascal VOC includes 20 object classes, grouped into categories:
๐ง Person
-
Person
๐ Animals
-
Bird, Cat, Cow, Dog, Horse, Sheep
๐ Vehicles
-
Aeroplane, Bicycle, Boat, Bus, Car, Motorbike, Train
๐️ Indoor Objects
-
Bottle, Chair, Dining table, Potted plant, Sofa, TV/monitor
๐งช Supported Tasks
๐น 1. Object Classification
Determine whether an object category is present in an image.
๐น 2. Object Detection
Detect the presence and location (bounding boxes) of objects in an image.
๐น 3. Semantic Segmentation
Pixel-wise labeling of object categories in an image.
๐น 4. Person Layout
Locate parts of a person (head, hands, feet, etc.).
๐พ Data Format: VOC XML
Each image is annotated with an XML file that follows the Pascal VOC annotation format, containing:
<annotation>
<folder>VOC2007</folder>
<filename>000001.jpg</filename>
<size>
<width>353</width>
<height>500</height>
<depth>3</depth>
</size>
<object>
<name>dog</name>
<bndbox>
<xmin>48</xmin>
<ymin>240</ymin>
<xmax>195</xmax>
<ymax>371</ymax>
</bndbox>
</object>
</annotation>
This format is still widely used and supported by many libraries like TensorFlow Object Detection API, YOLO, and Albumentations.
๐ Using VOC for Object Detection
๐ก Tip: Use VOCDetection
in PyTorch
from torchvision.datasets import VOCDetection
dataset = VOCDetection(
root="path/to/VOCdevkit",
year="2007",
image_set="train",
download=True
)
image, target = dataset[0]
print(target) # Annotation in VOC format
๐ Dataset Structure
VOCdevkit/
└── VOC2007/
├── JPEGImages/
├── Annotations/
├── ImageSets/
└── SegmentationClass/
๐ง Benchmark Results
Pascal VOC was the go-to benchmark before COCO. Many well-known models were initially validated on VOC:
Model | mAP on VOC 2007 | Notes |
---|---|---|
Fast R-CNN | ~70.0% | Introduced ROI pooling |
Faster R-CNN | ~73.2% | Added Region Proposal Network |
SSD | ~77.2% | Single-shot detection |
YOLOv1 | ~63.4% | Fast, real-time performance |
YOLOv3 | ~80.0% | Modern version |
๐ง Labeling Your Own Data in Pascal VOC Format
If you’re creating a custom object detection dataset, many annotation tools support VOC:
These export XML files compatible with TensorFlow and other tools.
๐ Resources
๐ Summary
Feature | Value |
---|---|
Total Images | ~11,000 |
Classes | 20 |
Tasks | Detection, Segmentation, Classification |
Format | Pascal VOC XML |
Supported Tools | TensorFlow, PyTorch, YOLO, CVAT |
Despite being older, Pascal VOC remains a gold standard for learning object detection. It's smaller and simpler than COCO, making it great for beginners, quick prototyping, or testing custom models.