Architecture

Convolutional Neural Network (CNN)

A neural network architecture using convolutional filters to extract spatial features from images or sequences.

Definition

CNNs are specialised neural networks for processing grid-structured data, particularly images. They use convolutional layers that apply learned filters across the input, detecting local features (edges, textures, patterns) while exploiting spatial locality and translation invariance. Pooling layers downsample feature maps; fully connected layers produce final predictions.

AlexNet (2012) demonstrated CNNs' transformative potential on ImageNet, outperforming hand-crafted features by a large margin and igniting the deep learning era. Subsequent architectures (VGG, ResNet, EfficientNet) improved accuracy through depth, residual connections, and neural architecture search.

While Vision Transformers (ViT) now rival CNNs on many benchmarks, CNNs remain widely deployed for their efficiency and inductive biases well-suited to local feature extraction in images, video, and audio spectrograms.

Examples

  • AlexNet (2012)
  • ResNet (image classification)
  • YOLO (object detection)
  • U-Net (medical image segmentation)