Edge AI (Fly Artificial Intelligence) or Edge Intelligence (Edge Intelligence - EI) refers to the practice of running artificial intelligence (AI) computations and algorithms on devices at the network’s edge—such as end-user devices—rather than in centralized cloud data data centers.
This approach aims to integrate AI capabilities into mobile devices, Internet of Things (IoT) sensors, smart cameras, robots like real world and other edge devices. Fundamentally, it seeks to combine edge computing paradigms with AI techniques to create distributed and autonomous intelligence closer to the data source.
Although narrowly defined as running AI on edge devices, from a broader perspective, it can also be viewed as a hierarchical collaboration between edge devices, edge servers and the cloud complete to optimize AI model training and inference. The level of this collaboration may vary depending on how much data is transferred and how far it is sent.
Six Levels of Edge Intelligence (Zhou et al.: Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing)
History and Development
The origins of Edge AI can be traced back to the 1990s with the emergence of content delivery networks (CDNs) that hosted web and game video content on servers close to users. However, the development of Edge AI in its current sense is closely tied to the evolution of computing models: from centralized mainframe computers to personal computers, client-server (C/S) and browser-server (B/S) architectures, and subsequently to cloud computing.
While the widespread adoption of cloud computing in the 2000s provided massive processing power and storage capacity in centralized data centers, challenges such as high latency, bandwidth bottlenecks and data privacy concerns—exacerbated by the explosion of mobile devices and IoT—necessitated new approaches. In response, edge computing paradigms were developed to move processing power to the network’s edge:
- Cloudlet: Introduced in 2009 as a local, small-scale “data center in a box” concept for mobile devices.
- Fog Computing: Introduced by Cisco in 2012 as an intermediate layer between the cloud and devices, particularly for IoT applications.
- Mobile/Multi-access Edge Computing (MEC): Defined by ETSI in 2014 to provide low latency and high bandwidth in mobile networks, later expanded to include other access networks.
- Micro Data Centers (mDCs): A concept of compact, portable or ruggedized data centers tailored for industrial applications.
The emergence of Edge AI has been enabled by the advancement of these edge computing infrastructures, increased efficiency in artificial intelligence algorithms—particularly deep learning learning—and the proliferation of IoT devices. It began to gain prominence in industry reports such as Gartner’s Hype Cycle around 2018 and has since attracted rapid interest in both academic and industrial domains. Today, Edge AI is recognized as a critical technology that brings AI to the “last kilometer” of the network.
Importance and Motivation
The rise of Edge AI stems from various technological and application needs:
1. Low Latency: Bringing AI computations closer to where data is generated or to the end user eliminates the delay of sending data to the cloud and waiting for results, enabling millisecond-level response times critical for time-sensitive applications such as autonomous vehicles, real-time video analysis, industrial control and AR/VR.
2. Bandwidth and Cost Savings: Processing large volumes of raw data—especially video—at the edge rather than transmitting it to the cloud significantly reduces network traffic and associated communication costs. Only Often, only meaningful results or model updates need to be sent to the cloud.
3. Privacy and Security: Processing sensitive data—such as health records, personal images or industrial secrets—locally on the device or a trusted edge server minimizes the risk of data exposure during transmission and centralized storage, thereby enhancing privacy and data security. Techniques like Federated Learning provide additional safeguards.
4. Reliability and Accessibility: Edge devices can execute AI tasks locally even when network connectivity is intermittent or absent, ensuring operational continuity for critical infrastructure and industrial applications.
5. Unlocking Edge Data Potential: The vast volume of data generated by billions of IoT and mobile devices is predominantly produced at the network’s edge. Edge AI is an essential tool for extracting real-time insights and meaningful information from this data.
6. New Applications and Capabilities: Edge AI enables numerous new application scenarios previously impossible or impractical: advanced automation, smarter devices, personalized experiences and context awareness are among them.
7. Scalability and Distribution: By distributing AI workloads from centralized cloud systems to distributed edge devices, overall system scalability improves and congestion at a single central point is reduced.
For these reasons, the scale of Edge AI is expected to multiply significantly in the coming years.
Working Principles and Architectures
Edge AI systems typically operate within a hierarchical structure comprising edge devices (data-collecting sensors, cameras, mobile phones, etc.), edge servers (more powerful computing units connected to access points, base stations or local network gateways), and the cloud (for centralized storage, intensive model training and global coordination). Within this architecture, AI model training and inference can be distributed in various ways:
Model Training
- Cloud Training: The most common approach. Data is sent from the edge to the cloud, and models are trained centrally.
- Federated Learning: A privacy-focused approach where data remains on the device and only model updates—such as gradients or weights—are aggregated on a central server (cloud or edge server). This is particularly important when working with sensitive data or when reducing communication costs is desired. Various variants exist (communication-efficient, resource-optimized, security-enhanced).
- Distributed Edge Training: Models are trained collaboratively across edge devices or edge servers, using either a central coordinator or peer-to-peer methods such as Gossip protocols.
- Transfer Learning / Knowledge Distillation: Knowledge learned from a large pre-trained model (teacher) is transferred to a smaller model (student) designed to run on edge devices, accelerating training and reducing resource requirements.
Model Inference
- On-Device Inference: The model runs entirely on the edge device. It offers the lowest latency and highest privacy but is constrained by the device’s resources. Model compression techniques play a critical role here.
- Edge Inference: The device sends data to a nearby edge server, inference is performed there, and the result is returned to the device.
- Device-Edge Co-inference: The model is split between the device and the edge server. The device runs the initial layers, sends compressed intermediate data to the edge server, and the remaining layers are executed there. Dynamically selecting the split point (e.g., Neurosurgeon, Edgent) can optimize performance.
- Edge-Cloud Co-inference: The device sends data to the edge, and the edge may forward it to the cloud if needed, or the model is split between edge and cloud. This is used especially in scenarios requiring intensive computation.
- Early Exit: Exit points are added at intermediate layers of the model. For simple inputs, results are obtained early from an intermediate layer without running the full model, reducing latency.
- Edge Caching: Previously computed inference results or intermediate features for similar inputs are cached at the edge to avoid redundant computations.
Autonomous Edge AI
Recently, a vision has emerged to make Edge AI systems autonomous using large language models (LLMs), such as GPT. In this approach, the LLM acts as a controller in the cloud, understanding user requests in natural language, evaluating the capabilities of existing AI models, decomposing tasks into subtasks, selecting and coordinating appropriate models, and even automatically generating code for Federation Learning.
Federated Learning Process (Source: Wang et al. 2019, In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning)
Application Areas
Edge AI has application potential across nearly every industry. Key application areas include:
- Smart Cities and Homes: Local intelligence in devices such as traffic management systems, smart parking, video analytics for public safety, energy management, voice assistants, smart thermostats and security cameras.
- Autonomous Vehicles and Transportation: Real-time processing of sensor data for vehicle perception, immediate decision-making (e.g., emergency braking), route planning and communication with other vehicles (V2V) or roadside units (RSU).
- Industrial IoT (IIoT) and Smart Manufacturing: Factory automation, predictive maintenance (monitoring machine conditions and predicting failures), quality control (defect detection via image processing on production lines), robotic control and hazardous environment monitoring.
- Healthcare: Real-time patient monitoring and anomaly detection using data from wearable devices, early warning systems and medical image analysis.
- Video Analytics and Surveillance: Performing tasks such as facial recognition, object detection and crowd analysis on servers near cameras to reduce bandwidth requirements and latency.
- Retail and Marketing: Customer behavior analysis and delivery of personalized recommendations and advertisements.
- Augmented/Virtual Reality (AR/VR): Delivering low-latency, high-quality experiences that respond instantly to user movements and surroundings.
- Drones and Robots: Processing autonomous navigation, environmental perception and task execution capabilities on the device or nearby.
Enabling Technologies
The implementation of Edge AI is made possible by various technological advancements in hardware and software:
- Hardware: Specialized AI Accelerators: Chips designed to efficiently run AI models on edge devices with low power consumption. Examples: Google Edge TPU, Intel Movidius VPU/Nervana NNP, Huawei Ascend, Qualcomm Snapdragon (with NPU/APU), HiSilicon Kirin (NPU), MediaTek Helio (APU), NVIDIA Jetson (mobile GPU).
- Mobile CPUs and GPUs: Standard processors in smartphones and other edge devices can be used for lightweight AI tasks when combined with optimized AI libraries.
- FPGAs (Field-Programmable Gate Arrays): Programmable hardware offering low power consumption and flexibility, enabling customization for specific AI workloads.
- Edge Server Hardware: Servers integrated into micro data centers (mDCs) or base stations/network gateways provide higher computational power.
- Software and Algorithms: AI/ML Models and Algorithms: Deep learning models (CNNs, RNNs/LSTMs) and Deep Reinforcement Learning (DRL) form the foundation of Edge AI. Algorithms optimized for resource-constrained environments are essential.
- Model Compression Techniques: Methods that reduce model size and computational requirements to enable execution on edge devices: Weight Pruning, Data Quantization, Compact Architecture Design, Knowledge Distillation.
- Federated Learning: A framework for distributed, privacy-preserving model training.
- AI Frameworks and Libraries: AI development and execution environments optimized for edge devices: TensorFlow Lite, PyTorch Mobile, Core ML, ONNX Runtime, OpenVINO, CMSIS-NN, SNPE, etc.
- Edge Computing Platforms and Frameworks: Software used to manage, deploy and orchestrate edge devices and applications: KubeEdge, Azure IoT Edge, AWS IoT Greengrass, EdgeX Foundry, Akraino, OpenNESS, etc.
- Network Technologies: 5G and beyond (6G), ultra-low latency (URLLC), Software-Defined Networking (SDN), Network Function Virtualization (NFV), and Network Slicing provide the necessary infrastructure for Edge AI.
Challenges and Future Directions
Although Edge AI is a rapidly evolving field, significant challenges remain to be addressed, along with emerging future directions:
- Resource Constraints: Limited computational power, memory, storage and battery life of edge devices make running complex AI models challenging. Energy efficiency is a critical concern.
- Algorithm and Model Management: Designing, compressing, deploying, updating and managing AI models tailored to diverse hardware and application requirements is complex. Compatibility issues may arise across heterogeneous devices.
- Security and Privacy: The distributed nature of edge devices makes them more vulnerable to security threats compared to centralized systems—such as data poisoning, model theft and unauthorized access. Ensuring data privacy (compliance with regulations like GDPR) and developing trusted mechanisms (e.g., blockchain) is essential.
- Network Management and Optimization: Dynamic network conditions (latency, bandwidth fluctuations), device mobility and coordination of large numbers of devices require intelligent management mechanisms for resource allocation (computation, communication, caching) and task scheduling. The concept of the “Intelligent Edge”—using AI to optimize the edge network—is a key direction in this area.
- Standardization and Interoperability: Common standards and open platforms are needed to ensure seamless interoperability between hardware and software components from different vendors.
- Data Management: Efficiently collecting, labeling (especially for supervised learning), filtering and processing the vast amounts of data generated at the edge is challenging. Data quality and non-IID (non-independent and identically distributed) data distributions pose difficulties for methods like Federated Learning.
- Autonomous Systems: There is a growing trend toward autonomous Edge AI systems capable of self-organization, adaptation and optimization, supported by technologies such as LLMs.
- Incentive Mechanisms: In a distributed ecosystem, economic models must be developed to incentivize different stakeholders—such as device owners and edge server providers—to share resources and collaborate.