Infrastructure

Edge AI

Running AI inference on local devices (phones, IoT sensors, vehicles) rather than in the cloud.

Definition

Edge AI brings AI computation to the device rather than relying on cloud connectivity. This enables lower latency (no round-trip to a server), better privacy (data stays on device), offline operation, and reduced bandwidth costs. Applications include real-time image processing, voice recognition, medical monitoring, and autonomous vehicle perception.

Edge AI requires model compression: quantisation, pruning (removing low-importance weights), knowledge distillation (training small student models to mimic large teacher models), and hardware-aware neural architecture search. Frameworks like TFLite, ONNX, and Apple's Core ML optimise for edge deployment.

Dedicated edge AI chips are proliferating: Apple Neural Engine (iPhone/Mac), Google Edge TPU, Qualcomm Hexagon DSP, NVIDIA Jetson (robotics), and Intel Movidius. As models become more capable, on-device AI for private, real-time applications is growing rapidly.