Computer Vision
Computer vision is a field of AI that enables machines to interpret and understand visual information from images and video. Drawing on digital image processing and deep learning, computer vision systems can identify objects, faces, text, scenes, and actions in visual data.
How Computer Vision Works
Modern computer vision uses convolutional neural networks (CNNs) and Vision Transformers (ViTs) to extract hierarchical visual features. Object detection models (YOLO, R-CNN, DETR) identify and localise objects; segmentation models (SAM, Mask R-CNN) delineate pixel-level boundaries; generative models (diffusion, GANs) create synthetic images. Foundation models like CLIP learn joint text-image embeddings enabling zero-shot visual understanding.
Key Use Cases
- Medical imaging analysis (radiology, pathology)
- Autonomous vehicle perception
- Retail checkout automation (Amazon Go)
- Industrial quality control and defect detection
- Facial recognition and access control
- Satellite imagery analysis
- Sports analytics and broadcast automation
Frequently Asked Questions
- What is computer vision?
- Computer vision is the field of AI that teaches machines to understand visual data — recognising objects, faces, text, and actions in images and video, often matching or exceeding human performance.
- How does computer vision work?
- Computer vision systems use convolutional neural networks (CNNs) or Vision Transformers trained on millions of labelled images to extract visual features hierarchically — from edges and textures to objects and scenes.
- Where is computer vision used in everyday life?
- Computer vision powers facial unlock on smartphones, photo search in Google Photos, document scanning apps, automatic license plate recognition, product search in retail apps, and content moderation on social platforms.