๐️ Cityscapes Dataset: Urban Scene Understanding at Its Best
The Cityscapes dataset is a large-scale, richly annotated dataset focused on semantic understanding of urban street scenes. It’s widely used in computer vision for tasks like semantic segmentation, instance segmentation, depth estimation, and scene parsing—particularly in autonomous driving and smart city applications.
๐ What is Cityscapes?
Cityscapes contains high-resolution images of street scenes collected from 50 European cities across different seasons, weather conditions, and times of day. The focus is on pixel-level semantic annotation, especially for objects relevant to urban mobility like roads, pedestrians, cars, traffic signs, and sidewalks.
๐ Key Statistics
Feature | Description |
---|---|
๐ผ️ Number of Images | 5,000 finely annotated + 20,000 coarsely labeled |
๐️ Resolution | 2048×1024 pixels |
๐ฃ️ Cities Covered | 50 European cities |
๐ง Classes | 30+ (19 commonly used for training/benchmarking) |
๐งต Annotations | Fine + Coarse annotations, with instance-level masks |
๐ Formats Available | JSON + PNG masks |
๐งพ Annotation Types
Cityscapes supports multiple types of annotations:
-
Semantic Segmentation – Per-pixel labeling of 19 urban object classes.
-
Instance Segmentation – Differentiates between multiple instances of the same object class.
-
Panoptic Segmentation – Combines semantic and instance segmentation.
-
Depth Maps – Stereo image pairs provide disparity for depth estimation.
-
Bounding Boxes – For object detection tasks.
-
Video Sequences – Available for temporal analysis (e.g., tracking, segmentation over time).
๐ฏ 19 Key Semantic Classes
The most commonly used subset of classes (for benchmarking) includes:
-
Flat: road, sidewalk
-
Human: person, rider
-
Vehicle: car, truck, bus, train, motorcycle, bicycle
-
Construction: building, wall, fence
-
Object: pole, traffic light, traffic sign
-
Nature: vegetation, terrain
-
Sky: sky
These are color-coded in ground truth masks for easy visualization.
๐งช Common Tasks & Applications
Task | Purpose |
---|---|
Semantic Segmentation | Label each pixel with an object class |
Instance Segmentation | Identify and separate multiple instances of objects |
Depth Estimation | Reconstruct 3D scene geometry from stereo images |
Panoptic Segmentation | Combine object detection + pixel-wise labeling |
Autonomous Driving | Real-time scene understanding for navigation |
๐ป Using Cityscapes with Python
๐งฐ Dataset Structure (Simplified)
cityscapes/
├── leftImg8bit/
│ ├── train/
│ ├── val/
│ └── test/
├── gtFine/
│ ├── train/
│ ├── val/
│ └── test/
๐ผ️ Visualizing Sample Image + Mask
import matplotlib.pyplot as plt
from PIL import Image
img_path = "leftImg8bit/train/cologne/cologne_000000_000019_leftImg8bit.png"
mask_path = "gtFine/train/cologne/cologne_000000_000019_gtFine_labelIds.png"
img = Image.open(img_path)
mask = Image.open(mask_path)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Input Image")
plt.subplot(1, 2, 2)
plt.imshow(mask)
plt.title("Segmentation Mask")
plt.show()
๐ง Models Trained on Cityscapes
Many state-of-the-art semantic segmentation models are trained or benchmarked on Cityscapes:
Model | Mean IoU (19 classes) | Notes |
---|---|---|
DeepLabv3+ | ~82% | Uses atrous convolutions |
PSPNet | ~81% | Pyramid Scene Parsing |
HRNet | ~81%+ | High-resolution network |
SegFormer | ~82%+ | Transformer-based segmentation |
Swin Transformer | ~83%+ | Vision Transformer variant |
You can find pre-trained weights for many of these models via TorchHub, MMsegmentation, and Hugging Face.
๐ Download and Resources
-
๐ ️ CityscapesScripts GitHub
๐งต Summary
Feature | Value |
---|---|
Total Images | 25,000+ (Fine + Coarse) |
Resolution | 2048×1024 |
Number of Classes | 30+ (19 used for evaluation) |
Key Tasks | Segmentation, Depth, Panoptic, Video |
Focus | Urban street scenes |
License | Non-commercial research |
Cityscapes is the go-to dataset for urban scene understanding. Whether you're building an autonomous driving system or training models for street-level scene parsing, Cityscapes offers the rich annotations and real-world diversity needed for high-quality semantic learning.