Radar has landed - discover the latest DDoS attack trends. Get ahead, stay protected.Get the report
  1. Home
  2. Blog
  3. Orchestrating AI: Event-Driven Architectures for Complex AI Workflows

Orchestrating AI: Event-Driven Architectures for Complex AI Workflows

  • By Gcore
  • May 23, 2024
  • 7 min read
Orchestrating AI: Event-Driven Architectures for Complex AI Workflows

This article was originally published on The New Stack. It’s written by Georgina Tryfou, a machine learning engineer at Gcore and an AI expert with more than 15 years of experience in machine learning and speech recognition.


In the current environment of AI frenzy, the implementation of complex AI workflows is becoming increasingly popular among companies that wish to enhance their offerings with AI abilities. In this article, I’ll share a behind-the-scenes look at how we implement event-driven architecture (EDA) in complex AI workflows at Gcore. I’ll walk you through the initial challenges, the architectural decisions made, and the outcomes of employing an EDA in a dynamic, real-world scenario, showing how EDA enhances system responsiveness, scalability, and flexibility for managing AI-driven tasks like subtitle generation for video content.

Why Event-Driven Architecture Matters for AI

Event-driven architecture (EDA) is a design pattern centered around the production, detection, consumption, and reaction to events rather than static, predefined operations. An event is any significant change in a state or an update that occurs within the system. EDA allows different parts of a system to communicate and operate independently, driven by the occurrence of these events, which can be anything from a user action to a completed process.

The adoption of EDA in AI workflow management marks a significant evolution from traditional architectures, such as monolithic, service-oriented, or polling-based architectures. Its principles of asynchronous communication, decoupling, and dynamic scalability align perfectly with the demands of modern AI applications, with three key benefits:

  • The architecture’s modularity makes it easier to scale specific components independently, such as scaling up language processing during high-demand periods in customer service applications without affecting other parts of the system.
  • EDA’s modular design simplifies the process of updating or replacing models with newer versions, as seen in health tech environments where predictive algorithms are frequently refined and deployed to keep pace with medical advancements or newer data.
  • The flexible nature of EDA allows for the seamless integration of various models to realize a complex AI workflow, such as combining image recognition with predictive maintenance in manufacturing, enhancing system robustness and operational efficiency.

These benefits, observed across different sectors, enhance the scalability and responsiveness of AI systems and also their robustness and adaptability, making EDA indispensable for managing complex, multi-model AI workflows across industries and use cases.

Implementing Event-Driven Architecture in AI: A Practical Case Study

At Gcore, we’ve implemented EDA within Gcore Video Streaming AI features. Today, I’ll share with you how we apply EDA for AI subtitle generation for video.

This project began with the goal of improving the efficiency, latency, scalability, and reliability of subtitle generation in multiple languages from raw video content. The process involves several complex steps:

  1. Video decompression: The video file is either decompressed or transcoded into a format suitable for processing.
  2. Speech detection: Segments of the video where speech occurs are identified and distinguished from background noise or silence using specialized ML models.
  3. Speech-to-text conversion (transcription): The detected speech is converted into text. This step requires inference using complex speech recognition models capable of handling a range of languages, accents, and dialects.
  4. Text post-processing: Transcription errors, punctuation, and grammar are corrected. The text is formatted to match the video’s timing; for example, it could be broken into timed subtitles.
  5. Translation (optional): If subtitles are required in multiple languages, the transcribed text may be translated into one or more target languages, again via inference using specialized machine-translation models.
  6. Subtitle synchronization: Subtitle display is timed to match the speech in the video, ensuring that the subtitles appear on screen precisely when the corresponding speech is heard.

Each of these steps requires specialized AI models or algorithms and may require data processing in real- or near-real time, especially in live-streaming scenarios. The result? Serious complexity.

The complexity arises not only from the technical challenges associated with each task, but also from the need to efficiently manage the flow of data between steps, handle errors or exceptions, and scale resources dynamically based on demand.

In our pursuit of orchestrating such sophisticated and demanding AI workflows, we designed an AI system that functions with precision and agility through a well-defined EDA. The architecture of this platform, outlined in the figure below, addresses all stages of AI-driven tasks, facilitates communication between components, and checks that each task can be dynamically scaled and autonomously handled.

Four core components underlie the Gcore Streaming AI platform backend. All these components are versatile and essential to a wide range of AI applications.

Django: API Service

At the front of the architecture lies the API service, which uses the robust Django framework. This is the primary interface for user interactions and processes incoming requests for varied services including transcription and content moderation services like nudity detection. This layer validates and parses incoming requests, triggering a cascade of subsequent tasks in the workflow, as represented on the far left of the diagram above where a user initiates a transcription request to the API service.

Celery: Processing Engine and Task Orchestration

Diving deeper into the backend, we leverage Celery, an asynchronous task queue that acts as a robust background processing engine. Celery is tasked with managing AI processes, such as transcribing audio to text or analyzing content for nudity, and other standalone processes, such as synchronizing transcribed content into subtitles. Celery, in combination with Redis which acts as a message broker, orchestrates these tasks and ensures that each task initiation and completion are driven by the occurrence of predefined events.

Celery’s ability to handle AI workflows is enhanced by a suite of advanced features for orchestrating complex workflows: groups, chains, and chords. These tools allow for the decomposition of high-level, complex AI tasks into granular subtasks, handling of their dependencies, and aggregation of their results.

Redis: Broker and Mediator Pattern

Redis plays a crucial role in our system as the broker and mediator, managing the distribution and coordination of tasks across the backend. It utilizes its fast, in-memory data structure store to handle the task queue efficiently. Within our architecture, task signatures and chains act as the mediators controlling the flow and logic of task execution. This mediation is based on event signals indicating task completion.

Redis’ ability to process these signals quickly is vital for maintaining a dynamic and responsive workflow, as shown in the diagram above: tasks are received by the Redis broker, directed to the appropriate processing containers, and their results are collected post-inference for seamless task transitions and data integrity.

AI Celery Workers: Dedicated AI Task Handling

Each AI Celery worker is dedicated to a specific AI task, deploying and managing AI models such as Whisper for transcription and Pyannote for VAD. These workers operate in isolated environments to make sure that each task is processed in a controlled and secure manner, minimizing the risk of interference between tasks. This setup enhances the scalability of our system by allowing each worker to scale independently based on task demands while simultaneously ensuring high reliability and efficiency in AI model execution.

System Requirements Unveiled: Scaling, Reliability, and Latency

The Gcore backend I just described produces three major benefits that are particularly important for AI workflows: scaling, reliability, and latency reduction.

Scaling

The platform scales to handle varying demand by dynamically allocating cloud resources and leveraging GPU acceleration for intensive ML tasks. This results in seamless scaling, avoiding the performance bottlenecks and high costs typical of traditional systems. By adapting computing power in real time, our system efficiently manages workloads during both peak and off-peak times without compromising performance.

Reliability

AI features within Gcore Video Streaming are designed for high reliability with robust fault tolerance and sophisticated error handling. Strategies like data replication and automatic recovery mechanisms promote system continuity even during failures. In video transcription, if a segment of audio is corrupted, our system can either skip that segment or retry processing it, rather than wasting resources on discarding or retrying the whole audio track.

Latency Reduction

System latency for AI elements is reduced by minimizing idle times and enhancing the transition speed between tasks. We employ three key strategies:

  • Segmenting large tasks into smaller parts for parallel processing across multiple GPUs
  • Optimizing workflows for immediate task transitions
  • Smartly scheduling resources to keep computational assets fully engaged

In video transcription, rather than processing the entire video at once, we break it into segments for concurrent processing. This approach shortens transcription times and helps resources be used efficiently, boosting overall system responsiveness.

Concrete Benefits: Our EDA Success Story

Adopting this system revolutionized the management of complex AI workflows within the Gcore Video Streaming backend. Specifically, the EDA enabled us to reduce analysis time, parallelize AI tasks, scale AI workers independently, and ensure system flexibility.

  • Reduce analysis time: By utilizing EDA, we dramatically decreased the time required to analyze a single video with a set of pre-trained models. This means faster processing of videos for tasks like subtitle generation and content moderation.
  • Parallelize AI tasks: Parallel processing of AI tasks means breaking down complex processes into smaller, manageable tasks that could be executed concurrently. This approach sped up the overall process and optimized the use of computational resources.
  • Scale AI workers independently: Understanding the diverse demands of different AI tasks, our architecture scales AI workers based on the specific requirements of each task. For instance, a single request for subtitle generation might trigger one task for Pyannote (for voice activity detection) and potentially 100 tasks for Whisper (for speech-to-text), with only the latter requiring dynamic scaling due to higher demand.
  • Ensure system flexibility: We aimed to create a highly flexible system capable of quickly adapting to any new AI request. This required the ability to load models in an ad-hoc manner, ensuring our system could immediately respond to and serve new or evolving AI demands without significant reconfiguration.

We Made These Mistakes So You Don’t Have To

Sharing is caring: Here are three things to keep in mind when setting up your own EDA for AI workflows to get the best results right away.

  • Avoid common pitfalls: Design the system with fault tolerance in mind from the outset. Anticipate potential failures in individual components and ensure that the architecture can gracefully handle these incidents without disrupting the overall workflow. Effective error handling and retry mechanisms are essential.
  • Choose the correct topology: Implementing a mediator pattern topology can significantly simplify the implementation of business logic and the modularity and reusability of AI models. Initially employing a broker topology, we encountered limitations in managing complex AI tasks due to its linear communication model. To address these challenges and improve our system’s scalability and modularity, we transitioned to a mediator topology. This change introduced a central mediator to manage AI business logic and orchestrate events, allowing components to operate independently and more efficiently. The shift streamlined the development process and significantly enhanced the system’s adaptability and robustness.
  • Plan for rapid integration: Flexibility is key in any architecture designed for AI workflows. Allow for the quick addition and integration of new models into end services, essential in this fast-evolving field, where the ability to swiftly adopt and deploy new models can provide a significant competitive advantage.

Future Directions in Event-Driven AI Architectures

We’re always looking to the future and innovating our EDA AI systems at Gcore. Two future directions look particularly promising.

Continuous Learning and Adaptation

Incorporating mechanisms for continuous learning and model adaptation requires periodically updating models with new data and, less obviously, dynamically adjusting workflows and processes based on real-time performance metrics and feedback loops. As AI models continue to grow in complexity and capability, developing robust systems for continuous evaluation and deployment becomes critical. This includes automated performance monitoring, version control, and seamless deployment of updated models without disrupting service.

Embracing LLMs and GAI

Our architecture needs to adapt to AI’s changes. While the rise of LLMs and GAI might suggest that traditional AI inference workflows could become obsolete, the reality is that our proposed architecture supports critical areas of AI deployment, such as continuous model learning and evaluation. Our event-driven system’s flexibility makes it well-suited to integrate LLMs for enhanced decision-making processes and to adapt workflows in response to the capabilities of GAI, where AI models will increasingly be replaced by a single, more powerful one.

Conclusion

We’ve found that adopting an EDA for workflow processing offers significant benefits for scalability, reliability, and efficiency in managing complex AI systems in cloud and streaming environments. This approach addresses critical challenges, including dynamic scaling of large ML models, system robustness, and latency reduction. EDA is already proving itself essential for the evolution of scalable and efficient AI systems.

To experience the end product for yourself, check out Gcore Video Streaming and its impressive AI features, including transcription, translation, content moderation, and object recognition.

Try Gcore Video Streaming free for 14 days

Related articles

Protecting networks at scale with AI security strategies

Network cyberattacks are no longer isolated incidents. They are a constant, relentless assault on network infrastructure, probing for vulnerabilities in routing, session handling, and authentication flows. With AI at their disposal, threat actors can move faster than ever, shifting tactics mid-attack to bypass static defenses.Legacy systems, designed for simpler threats, cannot keep pace. Modern network security demands a new approach, combining real-time visibility, automated response, AI-driven adaptation, and decentralized protection to secure critical infrastructure without sacrificing speed or availability.At Gcore, we believe security must move as fast as your network does. So, in this article, we explore how L3/L4 network security is evolving to meet new network security challenges and how AI strengthens defenses against today’s most advanced threats.Smarter threat detection across complex network layersModern threats blend into legitimate traffic, using encrypted command-and-control, slow drip API abuse, and DNS tunneling to evade detection. Attackers increasingly embed credential stuffing into regular login activity. Without deep flow analysis, these attempts bypass simple rate limits and avoid triggering alerts until major breaches occur.Effective network defense today means inspection at Layer 3 and Layer 4, looking at:Traffic flow metadata (NetFlow, sFlow)SSL/TLS handshake anomaliesDNS request irregularitiesUnexpected session persistence behaviorsGcore Edge Security applies real-time traffic inspection across multiple layers, correlating flows and behaviors across routers, load balancers, proxies, and cloud edges. Even slight anomalies in NetFlow exports or unexpected east-west traffic inside a VPC can trigger early threat alerts.By combining packet metadata analysis, flow telemetry, and historical modeling, Gcore helps organizations detect stealth attacks long before traditional security controls react.Automated response to contain threats at network speedDetection is only half the battle. Once an anomaly is identified, defenders must act within seconds to prevent damage.Real-world example: DNS amplification attackIf a volumetric DNS amplification attack begins saturating a branch office's upstream link, automated systems can:Apply ACL-based rate limits at the nearest edge routerFilter malicious traffic upstream before WAN degradationAlert teams for manual inspection if thresholds escalateSimilarly, if lateral movement is detected inside a cloud deployment, dynamic firewall policies can isolate affected subnets before attackers pivot deeper.Gcore’s network automation frameworks integrate real-time AI decision-making with response workflows, enabling selective throttling, forced reauthentication, or local isolation—without disrupting legitimate users. Automation means threats are contained quickly, minimizing impact without crippling operations.Hardening DDoS mitigation against evolving attack patternsDDoS attacks have moved beyond basic volumetric floods. Today, attackers combine multiple tactics in coordinated strikes. Common attack vectors in modern DDoS include the following:UDP floods targeting bandwidth exhaustionSSL handshake floods overwhelming load balancersHTTP floods simulating legitimate browser sessionsAdaptive multi-vector shifts changing methods mid-attackReal-world case study: ISP under hybrid DDoS attackIn recent years, ISPs and large enterprises have faced hybrid DDoS attacks blending hundreds of gigabits per second of L3/4 UDP flood traffic with targeted SSL handshake floods. Attackers shift vectors dynamically to bypass static defenses and overwhelm infrastructure at multiple layers simultaneously. Static defenses fail in such cases because attackers change vectors every few minutes.Building resilient networks through self-healing capabilitiesEven the best defenses can be breached. When that happens, resilient networks must recover automatically to maintain uptime.If BGP route flapping is detected on a peering session, self-healing networks can:Suppress unstable prefixesReroute traffic through backup transit providersPrevent packet loss and service degradation without manual interventionSimilarly, if a VPN concentrator faces resource exhaustion from targeted attack traffic, automated scaling can:Spin up additional concentratorsRedistribute tunnel sessions dynamicallyMaintain stable access for remote usersGcore’s infrastructure supports self-healing capabilities by combining telemetry analysis, automated failover, and rapid resource scaling across core and edge networks. This resilience prevents localized incidents from escalating into major outages.Securing the edge against decentralized threatsThe network perimeter is now everywhere. Branches, mobile endpoints, IoT devices, and multi-cloud services all represent potential entry points for attackers.Real-world example: IoT malware infection at the branchMalware-infected IoT devices at a branch office can initiate outbound C2 traffic during low-traffic periods. Without local inspection, this activity can go undetected until aggregated telemetry reaches the central SOC, often too late.Modern edge security platforms deploy the following:Real-time traffic inspection at branch and edge routersBehavioral anomaly detection at local points of presenceAutomated enforcement policies blocking malicious flows immediatelyGcore’s edge nodes analyze flows and detect anomalies in near real time, enabling local containment before threats can propagate deeper into cloud or core systems. Decentralized defense shortens attacker dwell time, minimizes potential damage, and offloads pressure from centralized systems.How Gcore is preparing networks for the next generation of threatsThe threat landscape will only grow more complex. Attackers are investing in automation, AI, and adaptive tactics to stay one step ahead. Defending modern networks demands:Full-stack visibility from core to edgeAdaptive defense that adjusts faster than attackersAutomated recovery from disruption or compromiseDecentralized detection and containment at every entry pointGcore Edge Security delivers these capabilities, combining AI-enhanced traffic analysis, real-time mitigation, resilient failover systems, and edge-to-core defense. In a world where minutes of network downtime can cost millions, you can’t afford static defenses. We enable networks to protect critical infrastructure without sacrificing performance, agility, or resilience.Move faster than attackers. Build AI-powered resilience into your network with Gcore.Check out our docs to see how DDoS Protection protects your network

Introducing Gcore for Startups: created for builders, by builders

Building a startup is tough. Every decision about your infrastructure can make or break your speed to market and burn rate. Your time, team, and budget are stretched thin. That’s why you need a partner that helps you scale without compromise.At Gcore, we get it. We’ve been there ourselves, and we’ve helped thousands of engineering teams scale global applications under pressure.That’s why we created the Gcore Startups Program: to give early-stage founders the infrastructure, support, and pricing they actually need to launch and grow.At Gcore, we launched the Startups Program because we’ve been in their shoes. We know what it means to build under pressure, with limited resources, and big ambitions. We wanted to offer early-stage founders more than just short-term credits and fine print; our goal is to give them robust, long-term infrastructure they can rely on.Dmitry Maslennikov, Head of Gcore for StartupsWhat you get when you joinThe program is open to startups across industries, whether you’re building in fintech, AI, gaming, media, or something entirely new.Here’s what founders receive:Startup-friendly pricing on Gcore’s cloud and edge servicesCloud credits to help you get started without riskWhite-labeled dashboards to track usage across your team or customersPersonalized onboarding and migration supportGo-to-market resources to accelerate your launchYou also get direct access to all Gcore products, including Everywhere Inference, GPU Cloud, Managed Kubernetes, Object Storage, CDN, and security services. They’re available globally via our single, intuitive Gcore Customer Portal, and ready for your production workloads.When startups join the program, they get access to powerful cloud and edge infrastructure at startup-friendly pricing, personal migration support, white-labeled dashboards for tracking usage, and go-to-market resources. Everything we provide is tailored to the specific startup’s unique needs and designed to help them scale faster and smarter.Dmitry MaslennikovWhy startups are choosing GcoreWe understand that performance and flexibility are key for startups. From high-throughput AI inference to real-time media delivery, our infrastructure was designed to support demanding, distributed applications at scale.But what sets us apart is how we work with founders. We don’t force startups into rigid plans or abstract SLAs. We build with you 24/7, because we know your hustle isn’t a 9–5.One recent success story: an AI startup that migrated from a major hyperscaler told us they cut their inference costs by over 40%…and got actual human support for the first time. What truly sets us apart is our flexibility: we’re not a faceless hyperscaler. We tailor offers, support, and infrastructure to each startup’s stage and needs.Dmitry MaslennikovWe’re excited to support startups working on AI, machine learning, video, gaming, and real-time apps. Gcore for Startups is delivering serious value to founders in industries where performance, cost efficiency, and responsiveness make or break product experience.Ready to scale smarter?Apply today and get hands-on support from engineers who’ve been in your shoes. If you’re an early-stage startup with a working product and funding (pre-seed to Series A), we’ll review your application quickly and tailor infrastructure that matches your stage, stack, and goals.To get started, head on over to our Gcore for Startups page and book a demo.Discover Gcore for Startups

Announcing a new AI-optimized data center in Southern Europe

Good news for businesses operating in Southern Europe! Our newest cloud regions in Sines, Portugal, give you faster, more local access to the infrastructure you need to run advanced AI, ML, and HPC workloads across the Iberian Peninsula and wider region. Sines-2 marks the first region launched in partnership with Northern Data Group, signaling a new chapter in delivering powerful, workload-optimized infrastructure across Europe. And Sines-3 expands capacity and availability for the region.Strategically positioned in Portugal, Sines-2 and Sines-3 enhance coverage in Southern Europe, providing a lower-latency option for customers operating in or targeting this region. With the explosive growth of AI, machine learning, and compute-intensive workloads, these new regions are designed to meet escalating demand with cutting-edge GPU and storage capabilities.You can activate Sines-2 and Sines-3 for GPU Cloud or Everywhere Inference today with just a few clicks.Built for AI, designed to scaleSines-2 and Sines-3 bring with them next-generation infrastructure features, purpose-built for today's most demanding workloads:NVIDIA H100 GPUs: Unlock the full potential of AI/ML training, high-performance computing (HPC), and rendering workloads with access to H100 GPUs.VAST NFS (file sharing protocol) support: Benefit from scalable, high-throughput file storage ideal for data-intensive operations, research, and real-time AI workflows.IaaS portfolio: Deploy Virtual Machines, manage storage, and scale infrastructure with the same consistency and reliability as in our flagship regions.Organizations operating in Portugal, Spain, and nearby regions can now deploy workloads closer to end users, improving application performance. For finance, healthcare, public sector, and other organisations running sensitive workloads that must stay within a country or region, Sines-2 and Sines-3 are easy ways to access state-of-the-art GPUs with simplified compliance. Whether you're building AI models, running simulations, or managing rendering pipelines, Sines-2 and Sines-3 offer the performance, capacity, availability, and proximity you need.And best of all, servers are available and ready to deploy today.Run your AI workloads in Portugal todayWith these new Sines regions and our partnership with Northern Data Group, we're making it easier than ever for you to run AI workloads at scale. If you need speed, flexibility, and global reach, we're ready to power your next AI breakthrough.Unlock the power of Sines-2 and Sines-3 today

GTC Europe 2025: watch Seva Vayner on European AI trends

Inference is becoming Europe’s core AI workload. Telcos are moving fast on low-latency infrastructure. Data sovereignty is shaping every deployment decision.At GTC Europe, these trends were impossible to miss. The conversation has moved beyond experimentation to execution, with exciting, distinctly European priorities shaping conversations.Gcore’s own Seva Vayner, Product Director of Edge Cloud and AI, shared his take on this year’s event during GTC. He sees a clear shift in what European enterprises are asking for and what the ecosystem is ready to deliver.Scroll on to watch the interview and see where AI in Europe is heading.“It’s really a pleasure to see GTC in Europe”After years of global AI strategy being shaped primarily by the US and China, Europe is carving its own path. Seva notes that this year’s GTC Europe wasn’t just a regional spin-off. it marked the emergence of a distinctly European voice in AI development.“First of all, it's really a pleasure to see that GTC in Europe happened, and that a lot of European companies came together to have the conversation and build the ecosystem.”As Seva notes, the real excitement came from watching European players collaborate. The focus was less on following global trends and more on co-creating the region’s own AI trajectory.“Inference workloads will grow significantly in Europe”Inference was a throughline across nearly every session. As Seva points out, Europe is still at the early stages of adopting inference at scale, but the shift is happening fast.“Europe is only just starting its journey into inference, but we already see the trend. Over the next 5 to 10 years, inference workloads will grow significantly. That’s why GTC Europe is becoming a permanent, yearly event.”This growth won’t just be driven by startups. Enterprises, governments, and infrastructure providers are all waking up to the importance of real-time, regional inference capabilities.“There’s real traction. Companies are more and more interested in how to deliver low-latency inference. In a few years, this will be one of the most crucial workloads for any GPU cloud in Europe.”“Telcos are getting serious about AI”One of the clearest signs of maturity at GTC Europe was that telcos and CSPs are actively looking to deploy AI. And they’re asking the hard questions about how to integrate it into their infrastructure at a vast scale.“One of the most interesting things is how telcos are thinking about adopting AI workloads on their infrastructure to deliver low latency. Sovereignty is crucial, especially for customers looking to serve training or inference workloads inside their region. And also user experience: how can I get GPU capacity in clusters, or deliver inference in just a few clicks?”This theme—fast, sovereign, self-service AI—popped up again and again. Telcos and service providers want frictionless deployment and local control.“Companies are struggling most with data”While model deployment and infrastructure strategy took center stage, Seva reminds us that data processing and storage remains the bottleneck. Enterprises know they need to adopt AI, but they’re still navigating where and how to store and process the data that fuels it.“One of the biggest struggles for end customers is the data: where it’s processed, where it’s stored, and what kind of capabilities are available. From a European perspective, we already see more and more companies looking for sovereign data privacy and simple, mature solutions for end users.”That’s a familiar challenge for enterprises operating under GDPR, NIS2, and other compliance frameworks. The new wave of AI infrastructure has to be built for performance and for trust.AI in Europe: responsible, scalable, and localSeva’s key takeaway is that AI in Europe is no longer about catching up, it’s about doing it differently. The questions have changed from “Should we do AI?” to “How do we scale it responsibly, reliably, and locally?”From sovereign deployment to edge-first infrastructure, GTC Europe 2025 showed that inference is the foundation of how European businesses plan to run AI. “The ecosystem is coming together,” explains Seva. “And the next five years will be crucial for defining how AI will work: not just in the cloud, but everywhere.”If you’re looking to reduce latency, cut costs, and stay compliant while deploying AI in production, we invite you to download our free ebook, The inference optimization playbook.Download our free inference optimization playbook

Gcore and Orange Business launch innovation program piloting joint solution to deliver sovereign inference as a service

Gcore and Orange Business have kicked off a strategic co-innovation program with the mission to deliver a scalable, production-grade AI inference service that is sovereign by design. By combining Orange Business’ secure, trusted cloud infrastructure and Gcore’s AI inference private deployment service, the collaboration empowers European enterprises and public sector organizations to run inference workloads at scale, without compromising on latency, control, or compliance.Gcore’s AI inference private deployment service is already live on Orange Business’ Cloud Avenue infrastructure. Selected enterprises across industries are actively testing it in real-world scenarios. These pilot customers are exploring how fast, secure, and compliant inference can accelerate their AI projects, cut deployment times, and reduce infrastructure overhead.The prototype will be demonstrated at NVIDIA GTC Paris, at the Taiga Cloud booth G26. Stop by any time to see it in action.The inference supercycle is underwayBy 2030, inference will comprise 70% of enterprise AI workloads. Telcos are well positioned to lead this shift due to their dense edge presence, licensed national data infrastructure, and long-standing trust relationships.Gcore’s inference solution provides a sovereign, edge-native inference layer. It enables users to serve real-time, GPU-intensive applications like agentic AI, trusted LLMs, computer vision, and predictive analytics, all while staying compliant with Europe’s evolving data and AI governance frameworks.From complexity to three clicksEnterprise AI doesn’t need to be hard. Deploying inference workloads at scale used to demand Kubernetes fluency, large MLOps teams, and costly trial-and-error.Now? It’s just three clicks:Pick a model: Choose from NVIDIA NIMs, open source, or proprietary libraries.Choose a region: Select one of Orange Business’ accredited EU data centers.Deploy: See your workloads go live in under 10 seconds.Enterprises can launch inference projects faster, test ideas more quickly, and deliver production-ready AI services without spending months on ML plumbing.Explore our blog to watch a demo showing how enterprises can deploy inference workloads in just three clicks and ten seconds.Sovereign by designAll model data, logs, and inference results are stored exclusively within Orange Business’ own data centers in France, Germany, Norway, and Sweden. Cross-border data transfer is opt-in only, helping ensure alignment with GDPR, sector-specific regulations, and the forthcoming EU AI Act.This platform is built for trust, transparency, and sovereignty by default. Customers maintain full control over their data, with governance baked into every layer of the deployment.Performance without trade-offsGcore’s AI inference solution avoids the latency spikes, cold starts, and resource waste common in traditional cloud AI setups. Key design features include:Smart GPU routing: Directs each request to the nearest in-region GPU, delivering real-time performance with sub-50ms latency.Pre-loaded models: Reduces cold start delays and improves response times.Secure multi-tenancy: Isolates customer data while maximizing infrastructure efficiency.The result is a production-ready inference platform optimized for both performance and compliance.Powering the future of AI infrastructureThis partnership marks a step forward for Europe’s sovereign AI capabilities. It highlights how telcos can serve as the backbone of next-generation AI infrastructure, hosting, scaling, and securing workloads at the edge.With hundreds of edge POPs, trusted national networks, and deep ties across vertical industries, Orange Business is uniquely positioned to support a broad range of use cases, including real-time customer service AI, fraud detection, healthcare diagnostics, logistics automation, and public sector digital services.What’s next: validating real-world performanceThis phase of the Gcore and Orange Business program is focused on validating the solution through live customer deployments and performance benchmarks. Orange Business will gather feedback from early access customers to shape its future sovereign inference service offering. These insights will drive refinements and shape the roadmap ahead of a full commercial launch planned for later this year.Gcore and Orange Business are committed to delivering a sovereign inference service that meets Europe’s highest standards for speed, simplicity, and trust. This co-innovation program lays the foundation for that future.Ready to discover how Gcore and Orange Business can deliver sovereign inference as a service for your business?Request a preview

Why on-premises AI is making a comeback

In recent years, cloud AI infrastructure has soared in popularity. With its scalability and ease of deployment, it’s no surprise that organizations rushed to transfer their data to the cloud in a bid to become “cloud-first.”But now, the tide is turning.As AI workloads grow more complex and regulatory pressures increase, many companies are reconsidering their reliance on cloud and turning back toward on-premises AI infrastructure.Rather than doubling down on the cloud, organizations are diversifying—adopting multi-cloud models, sovereign cloud environments, and even hybrid or fully on-prem setups. The era of a single cloud provider handling everything is coming to an end. Why? Control, security, and performance are hard to find in the public cloud.Here’s why more businesses are bringing AI back in-house.#1 Enhanced data security and controlData security remains one of the most urgent concerns driving the return to on-prem infrastructure.For sensitive or high-priority workloads—common in sectors like finance, healthcare, and government—keeping data off the cloud is often non-negotiable. Cloud computing inherently increases risk by exposing data to shared environments, wider attack surfaces, and complex supply chains.Choosing a trusted cloud provider can mitigate some of those risks. But it can’t replace the peace of mind that comes from keeping sensitive data in-house.With on-premises AI, organizations gain fine-grained access control. Encryption keys remain internal and breach exposure shrinks dramatically. It’s also much easier to stay compliant with privacy laws when data never leaves your own secure perimeter.For industries where trust and confidentiality are everything, on-prem solutions offer full visibility into where and how data is stored and processed.#2 Performance enhancement and latency reductionLatency matters—especially in AI.On-premises AI systems excel in environments that require real-time performance and heavy compute loads. Processing data locally avoids the physical delays caused by transferring it across the internet to a cloud data center.By eliminating long-haul network hops, companies get near-instant access to computing resources. They also get to fine-tune their internal networks—using private fiber, low-hop switching, and other low-latency optimizations that cloud customers can’t control.Unlike multi-tenant cloud platforms, on-prem resources aren’t shared. That means consistently low, predictable latency.This is vital for use cases where milliseconds—or even microseconds—make a difference: autonomous vehicles, real-time analytics, robotic control systems, and high-speed trading. Fast feedback loops and localized processing enable better outcomes, tighter control, and faster decision-making at the edge.#3 Regulatory compliance and data sovereigntyAround the world, data privacy regulations are tightening. For most organizations, compliance isn’t optional.On-premises infrastructure helps keep data safely inside the organization’s network. This supports data sovereignty, ensuring that sensitive information remains subject only to local laws—not the policies of another country’s cloud provider.It's also a powerful hedge against geopolitical instability.While hyperscalers operate globally, they’re always headquartered somewhere. That makes their infrastructure vulnerable to political shifts, sanctions, or changes in international data law. Governments may require them to restrict access, share data, or cut off services entirely—especially to organizations in sanctioned or adversarial jurisdictions.Businesses relying on these providers risk disruption when regulations change. On-premises infrastructure, by contrast, offers reliable continuity and greater control—especially in uncertain times.#4 Cost control and operational benefitsCloud pricing may look flexible, but costs can escalate quickly.Data transfers, storage, and compute spikes all add up—fast. In contrast, on-premises infrastructure provides a predictable Total Cost of Ownership (TCO). Although upfront CapEx is higher, OpEx remains more stable over time.Organizations can invest in high-performance hardware tailored to their specific needs and amortize those costs across years. That means no surprise bills, no sudden price hikes, and no dependence on vendor pricing models.Of course, running on-prem infrastructure comes with its own challenges. It demands specialized teams for deployment, maintenance, and support. These experts are costly to recruit and retain—but they’re critical to ensure uptime, security, and performance.Still, for companies with relatively stable compute and storage needs, the long-term savings often outweigh the initial setup effort. On-prem also integrates more smoothly into existing IT workflows, without the need for internet access or additional network setup—another operational bonus.#5 Proactive threat detection and automated responsesOn-premises AI sometimes enables smarter, more customized security.Advanced platforms can continuously analyze live data streams using machine learning to detect anomalies and predict threats. When something suspicious is flagged, the system can respond instantly by quarantining data, blocking traffic, and alerting security teams.That kind of automation is essential for minimizing damage and downtime.With full infrastructure control, organizations can deploy bespoke monitoring systems that align with their threat models. Deep packet inspection, real-time anomaly detection, and behavioral analytics can be easier to configure and maintain on-prem than in shared cloud environments.These systems can also work seamlessly with WAAP and DDoS tools to detect and neutralize threats before they spread. The key is flexibility: whether on-prem or cloud-based, AI-driven security should adapt to your architecture and threat landscape, not the other way around.End-to-end visibility can give security teams a clearer picture and faster response options than generic, one-size-fits-all public cloud security tools.How to combine eon-premises control with cloud scalabilityLet’s be clear: on-premises AI isn’t perfect. It demands upfront investment. It requires skilled personnel to deploy and manage systems. And integrating AI into legacy environments takes thoughtful planning.But today’s tools are helping bridge those gaps. Modern platforms reduce the need for constant manual intervention. They support real-time updates to threat models and detection logic. As a result, security teams can spend more time on strategy and less on maintenance.Meanwhile, the cloud still plays an important role. It offers faster access to new tools, software updates, and next-gen GPU hardware.That’s why many organizations are opting for a hybrid model.Our recommendation: Keep your sensitive, high-priority workloads on-prem. Use the cloud for elastic scale and innovation. Together, they deliver the best of both worlds: performance, control, compliance, and flexibility.Secure your digital infrastructure with Gcore on-premises AI inferenceWhether you’re protecting sensitive data or running high-demand workloads, on-premises AI gives you the control and confidence you need. Securing sensitive data and managing high-demand workloads requires a level of control, performance, and predictability that only on-premises AI infrastructure delivers.Gcore Everywhere Inference Private Deployment makes it easier than ever to bring powerful serverless AI inference capabilities directly into your physical environment. Designed for scalable global performance, Everywhere Inference enables robust and secure multi-tenant AI inference deployments across on-prem and cloud environments, helping you meet data sovereignty requirements, reduce latency, and streamline deployment.Talk to us about your on-prem AI plans

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.