Edge AI
Running AI inference on local devices (phones, IoT sensors, vehicles) rather than in the cloud.
Definition
Edge AI brings AI computation to the device rather than relying on cloud connectivity. This enables lower latency (no round-trip to a server), better privacy (data stays on device), offline operation, and reduced bandwidth costs. Applications include real-time image processing, voice recognition, medical monitoring, and autonomous vehicle perception.
Edge AI requires model compression: quantisation, pruning (removing low-importance weights), knowledge distillation (training small student models to mimic large teacher models), and hardware-aware neural architecture search. Frameworks like TFLite, ONNX, and Apple's Core ML optimise for edge deployment.
Dedicated edge AI chips are proliferating: Apple Neural Engine (iPhone/Mac), Google Edge TPU, Qualcomm Hexagon DSP, NVIDIA Jetson (robotics), and Intel Movidius. As models become more capable, on-device AI for private, real-time applications is growing rapidly.
Examples
- Face ID (Apple Neural Engine)
- Google Pixel camera AI
- Tesla FSD chip
- NVIDIA Jetson (robotics)
Companies using this
Related Terms
Inference
Using a trained AI model to generate predictions or outputs on new input data.
Quantisation
Reducing the numerical precision of model weights to decrease memory footprint and accelerate inference with minimal accuracy loss.
GPU (Graphics Processing Unit)
Specialised processors with thousands of cores enabling the massive parallel computation required for AI training and inference.