Home
Blog
When Cloud Meets Intelligence: Inference AI as a Service

When Cloud Meets Intelligence: Inference AI as a Service

By Gcore

November 27, 2023

6 min read

When Cloud Meets Intelligence: Inference AI as a Service

Inference AI is a specialized form of artificial intelligence that applies trained data models to new data for real-time decision making or predictions. When offered “as a service,” inference AI is cloud-based, providing businesses with the ability to leverage real-time AI decision-making capabilities without the need for in-house AI hardware and expertise. Outsourcing the inferencing workload to cloud services can save businesses costs associated with building and maintaining on-premises infrastructure and simultaneously let them benefit from the latest advancements in AI technology. Let’s delve into the complexities of deploying an inference AI model, explore the journey from model training to deployment, and discover Gcore’s offerings.

AI Model Training and Inference

In the world of AI, there are two key operations: training and inference. Regular AI encompasses both of these tasks, learning from data and then making predictions or decisions based on that data. By contrast, inference AI focuses solely on the inference phase. After a model has been trained on a dataset, inference AI takes over to apply this model to new data to make immediate decisions or predictions.

This specialization makes inference AI invaluable in time-sensitive applications, such as autonomous vehicles and real-time fraud detection, where making quick and accurate decisions is crucial. For self-driving cars, this service can swiftly analyze sensor data to make immediate driving decisions, eliminating latency and increasing safety. In real-time fraud detection, inference AI can instantaneously evaluate transactional data against historical patterns to flag or block suspicious activities.

The Need for Streamlined AI Production Management

Managing AI production involves navigating a complex matrix of interconnected decisions and adjustments. From data center location to financial budgeting, each decision carries a ripple effect. In our experience at Gcore, we see that this field is still defining its rules; the road from model training to deployment is more of a maze than a straight path. In this section, we’ll review the key components that every AI production manager must carefully consider to optimize performance and efficiency.

Location and latency should be your first consideration in AI production. Choose the wrong data center location, and you’re setting yourself up for latency issues that can seriously degrade user experience. For example, if you’re operating in the EU but your data center is in the United States, the transatlantic data travel times can create noticeable delays—a non-starter for inference AI.

Resource management demands real-time adaptability. Elements like CPUs, memory and specialized hardware — GPUs or TPUs — require constant tuning based on up-to-the-minute performance metrics. As you switch from development to full-scale production, dynamic resource management becomes not a luxury but a necessity, operating on a 24/7 cycle.

Financial planning is tightly linked to operational efficiency. Accurate budget forecasts are crucial for long-term sustainability, particularly given the volatility of computational demands in response to user activity.

Unlike the more mature landscape of software development, AI production management lacks a standardized playbook. This means you need to rely on bespoke expertise and be prepared for a higher error rate. It’s a field propelled by rapid innovation, and trial and error. In this sense, the sector is still in its adolescent phase — reckless, exciting and still figuring out its standards.

How to Deploy an Inference AI Model

Now that we understand the key components of AI production management, let’s walk through a step-by-step guide for deploying an AI inference model, focusing on the integration of various tools and resources. The aim is to build an environment that ensures swift, efficient deployment and scaling. Here are some tools that will be essential for success:

Docker: An industry standard for containerization, aiding in the smooth deployment of your model.
Whisper: A leading AI model for speech-to-text that serves as the foundation of our service.
Simple Server Framework (SSF): This Graphcore tool facilitates the building and packaging (containerizing) of applications for serving.
Harbor: An open source artifact storage software used for preserving Docker images, instrumental in our setup. Use the official docs to get set up.

Here’s what the pipeline looks like:

Preparation

Model: For this guide, we use a pre-trained model from Hugging Face. Training the model is outside the scope of this article.
Environment: We have a designated cluster for model building. All commands will be executed via SSH.

Step 1: Set Up a Virtual Environment

Create a virtual environment:

virtualenv .venv --prompt whisper:

Activate it:

source .venv/bin/activate

Step 2: Install Required Packages

Install SSF:

pip install https://github.com/graphcore/simple-server-framework/archive/refs/tags/v1.0.0.tar.gz

Install additional plugins for Docker:

wget https://github.com/docker/buildx/releases/download/v0.11.2/buildx-v0.11.2.linux-amd64mkdir -p ~/.docker/cli-pluginsmv buildx-v0.11.2.linux-amd64 ~/.docker/cli-plugins/docker-buildxchmod u+x ~/.docker/cli-plugins/docker-buildx

Step 3: Codebase

Clone the Gcore repository that contains all the necessary files:

git clone https://github.com/G-Core/ai-code-examples.git

Change the branch:

cd ai-code-examples && git checkout whisper-lux-small-ssf

Two key files here are `ssf_config.yaml` and `whisper_ssf_app.py`.

`ssf_config.yaml` is crucial for configuring the package that you’ll build. It contains fields specifying the name of the model, license and dependencies. It also outlines the inputs and outputs, detailing the endpoints and types of fields. For instance, for the Whisper model, the input is a temporary file (TempFile) and the output is a string (String). This information sets the framework for how your model will interact with users.

Example for Whisper:

26 endpoints:2728   - id: asr29     version: 130     desc: Simple application interface for Whisper31     custom: ~3233     inputs:3435       - id: file36         type: TempFile37         desc: Audio description text prompt3839     outputs:4041       - id: result42         type: String43         desc: Transcription of the text

SSF provides support for various data types. Detailed information can be found in its documentation.

`whisper_ssf_app.py` acts as a wrapper around your Whisper model, making it compatible with the Simple Server Framework (SSF). The script contains several essential methods:

`build`: This is where the model’s computational graph is constructed. It must run on a host with an IPU.
`startup`: Manages preliminary tasks before the model can begin serving user requests.
`request`: This is the heart of the system, responsible for processing user requests.
`shutdown`: Ensures graceful termination of the model, like completing ongoing requests.
`is_healthy`: This method allows the model to function both as a standalone Docker container and as part of larger, more complex systems like Kubernetes.

Within the build method, the function `compile_or_load_model_exe` is invoked. This is pivotal when constructing a model’s computational graph on IPUs. Here’s the catch: Creating this graph requires an initial user request as input. While you could use the first real user request for this, keep in mind that graph-building could consume 1 to 2 minutes, possibly more. Given today’s user expectations for speed, this delay could be a deal-breaker. To navigate this, the build method is designed to accept our predefined data as the first request for constructing the graph. In this setup, we use `bootstrap.mp3` to mimic that inaugural request.

Step 4: Build and Publish the Container

Build and publish the container, specifying your own Docker registry address and credentials:

gc-ssf --config ssf_config.yaml build package publish --package-tag harbortest.cloud.gcorelabs.com/whisper/mkhl --docker-username gitlab --docker-password XXXXXXXXXX --container-server harbortest.cloud.gcorelabs.com

The resulting container holds all necessary components: the model, a FastAPI wrapper, and the bootstrap.mp3 for initial warmup. It will be pushed to the Harbor registry.

Step 5: Deploy to Edge Node

For deployment on the edge node, the following command is used:

gc-ssf --stdout-log-level DEBUG deploy --config ssf_config.yaml --deploy-platform Gcore --port 8100 --deploy-gcore-target-address ai-inference-cluster-1 --deploy-gcore-target-username ubuntu --docker-username gitlab --docker-password XXXXXXXXXXX --package-tag harbortest.cloud.gcorelabs.com/whisper/mkhl:latest --deploy-package --container-server harbortest.cloud.gcorelabs.com

`gc-ssf` deploy uses SSH to run commands on the target host, so you’ll need to access it using `ssh-key` between nodes.

By following this pipeline, you establish a robust framework for deploying your AI models, ensuring they are not just efficient but also easily scalable and maintainable.

Inferring a More Intelligent Future

Inference AI’s growing role isn’t limited to tech giants; it’s vital for any organization aiming for agility and competitiveness. Investment in this technology constitutes a strategic alignment with a scalable, evolving solution to the data deluge problem. Inference AI as a service is poised to become an indispensable business tool because it simplifies AI’s technical complexities, offering a scalable and streamlined way to sift through mountains of data and extract meaningful, actionable insights.

How Gcore Uses Inference AI

Despite the surge in AI adoption, we recognize there’s still a gap in the market for specialized, out-of-the-box AI clusters that combine power with ease of deployment. Gcore is engineered to provide infrastructure and low latency services in order to go global faster. This solves one of the most significant challenges in the machine learning landscape: the transition from model development to scalable deployment. We use Graphcore’s Simple Server Framework to create an environment that’s capable not only of running machine learning models, but also of improving them continuously through Inference AI.

Conclusion

Inference AI as a service can transform the way businesses operate, allowing them to make real-time decisions and predictions based on trained data models. This cloud-based AI service streamlines the process of managing AI production, optimizing performance, and efficiently deploying AI models. It’s a tool with exciting prospects for any organization aiming to enhance its agility and competitiveness.

Gcore’s powerful, easy-to-deploy AI clusters provide the low latency and high performance required for effective inference AI as a service. With the use of Graphcore’’s Simple Server Framework, Gcore creates an environment capable of running machine learning models and improving them continuously through inference AI. For a deeper understanding of how Gcore is shaping the AI ecosystem, explore our AI infrastructure documentation.

Explore AI IPU

The cloud control gap: why EU companies are auditing jurisdiction in 2025

Europe’s cloud priorities are changing fast, and rightly so. With new regulations taking effect, concerns about jurisdictional control rising, and trust becoming a key differentiator, more companies are asking a simple question: Who really controls our data?For years, European companies have relied on global cloud giants headquartered outside the EU. These providers offered speed, scale, and a wide range of services. But 2025 is a different landscape.Recent developments have shown that data location doesn’t always mean data protection. A service hosted in an EU data center may still be subject to laws from outside the EU, like the US CLOUD Act, which could require the provider to hand over customer data regardless of where it’s stored.For regulated industries, government contractors, and data-sensitive businesses, that’s a growing problem. Sovereignty today goes beyond compliance. It’s central to business trust, operational transparency, and long-term risk management.Rising risks of non-EU cloud dependencyIn 2025, the conversation has shifted from “is this provider GDPR-compliant?” to “what happens if this provider is forced to act against our interests?”Here are three real concerns European companies now face:Foreign jurisdiction risk: Cloud providers based outside Europe may be legally required to share customer data with foreign authorities, even if it’s stored in the EU.Operational disruption: Geopolitical tensions or executive decisions abroad could affect service availability or create new barriers to access.Reputational and compliance exposure: Customers and regulators increasingly expect companies to use providers aligned with European standards and legal protections.European leaders are actively pushing for “full-stack European solutions” across cloud and AI infrastructure, citing sovereignty and legal clarity as top concerns. Leading European firms like Deutsche Telekom and Airbus have criticized proposals that would grant non-EU tech giants access to sensitive EU cloud data.This reinforces a broader industry consensus: jurisdictional control is a serious strategic issue for European businesses across industries. Relying on foreign cloud services introduces risks that no business can control, and that few can absorb.What European companies must do nextEuropean businesses can’t wait for disruption to happen. They must build resilience now, before potentially devastating problems occur.Audit their cloud stack to identify data locations and associated legal jurisdictions.Repatriate sensitive workloads to EU-based providers with clear legal accountability frameworks.Consider deploying hybrid or multi-cloud architectures, blending hyperscaler agility and EU sovereign assurance.Over 80% of European firms using cloud infrastructure are actively exploring or migrating to sovereign solutions. This is a smart strategic maneuver in an increasingly complex and regulated cloud landscape.Choosing a futureproof pathIf your business depends on the cloud, sovereignty should be part of your planning. It’s not about political trends or buzzwords. It’s about control, continuity, and credibility.European cloud providers like Gcore support organizations in achieving key sovereignty milestones:EU legal jurisdiction over dataAlignment with sectoral compliance requirementsResilience to legal and geopolitical disruptionTrust with EU customers, partners, and regulatorsIn 2025, that’s a serious competitive edge that shows your customers that you take their data protection seriously. A European provider is quickly becoming a non-negotiable for European businesses.Want to explore what digital sovereignty looks like in practice?Gcore’s infrastructure is fully self-owned, jurisdictionally transparent, and compliant with EU data laws. As a European provider, we understand the legal, operational, and reputational demands on EU businesses.Talk to us about sovereignty strategies for cloud, AI, network, and security that protect your data, your customers, and your business. We’re ready to provide a free, customized consultation to help your European business prepare for sovereignty challenges.Auditing your cloud stack is the first step. Knowing what to look for in a provider comes next.Not all EU-based cloud providers guarantee sovereignty. Learn what to evaluate in infrastructure, ownership, and legal control to make the right decision.Learn how to verify EU cloud control in our blog

Outpacing cloud‑native threats: How to secure distributed workloads at scale

The cloud never stops. Neither do the threats.Every shift toward containers, microservices, and hybrid clouds creates new opportunities for innovation…and for attackers. Legacy security, built for static systems, crumbles under the speed, scale, and complexity of modern cloud-native environments.To survive, organizations need a new approach: one that’s dynamic, AI-driven, automated, and rooted in zero trust.In this article, we break down the hidden risks of cloud-native architectures and show how intelligent, automated security can outpace threats, protect distributed workloads, and power secure growth at scale.The challenges of cloud-native environmentsCloud-native architectures are designed for maximum flexibility and speed. Applications run in containers that can scale in seconds. Microservices split large applications into smaller, independent parts. Hybrid and multi-cloud deployments stretch workloads across public clouds, private clouds, and on-premises infrastructure.But this agility comes at a cost. It expands the attack surface dramatically, and traditional perimeter-based security can’t keep up.Containers share host resources, which means if one container is breached, attackers may gain access to others on the same system. Microservices rely heavily on APIs to communicate, and every exposed API is a potential attack vector. Hybrid cloud environments create inconsistent security controls across platforms, making gaps easier for attackers to exploit.Legacy security tools, built for unchanging, centralized environments, lack the real-time visibility, scalability, and automated response needed to secure today’s dynamic systems. Organizations must rethink cloud security from the ground up, prioritizing speed, automation, and continuous monitoring.Solution #1: AI-powered threat detection forsmarter defensesModern threats evolve faster than any manual security process can track. Rule-based defenses simply can’t adapt fast enough.The solution? AI-driven threat detection.Instead of relying on static rules, AI models monitor massive volumes of data in real time, spotting subtle anomalies that signal an attack before real damage is done. For example, an AI-based platform can detect an unauthorized process in a container trying to access confidential data, flag it as suspicious, and isolate the threat within milliseconds before attackers can move laterally or exfiltrate information.This proactive approach learns, adapts, and neutralizes new attack vectors before they become widespread. By continuously monitoring system behavior and automatically responding to abnormal activity, AI closes the gap between detection and action, critical in cloud-native, regulated environments where even milliseconds matter.Solution #2: Zero trust as the new security baseline“Trust but verify” no longer cuts it. In a cloud-native world, the new rule is “trust nothing, verify everything”.Zero-trust security assumes that threats exist both inside and outside the network perimeter. Every request—whether from a user, device, or application—must be authenticated, authorized, and validated.In distributed architectures, zero trust isolates workloads, meaning even if attackers breach one component, they can’t easily pivot across systems. Strict identity and access management controls limit the blast radius, minimizing potential damage.Combined with AI-driven monitoring, zero trust provides deep, continuous verification, blocking insider threats, compromised credentials, and advanced persistent threats before they escalate.Solution #3: Automated security policies for scalingprotectionManual security management is impossible in dynamic environments where thousands of containers and microservices are spun up and down in real time.Automation is the way forward. AI-powered security policies can continuously analyze system behavior, detect deviations, and adjust defenses automatically, without human intervention.This eliminates the lag between detection and response, shrinks the attack window, and drastically reduces the risk of human error. It also ensures consistent security enforcement across all environments: public cloud, private cloud, and on-premises.For example, if a system detects an unusual spike in API calls, an automated security policy can immediately apply rate limiting or restrict access, shutting down the threat without impacting overall performance.Automation doesn’t just respond faster. It maintains resilience and operational continuity even in the face of complex, distributed threats.Unifying security across cloud environmentsSecuring distributed workloads isn’t just about having smarter tools, it’s about making them work together. Different cloud platforms, technologies, and management protocols create fragmentation, opening cracks that attackers can exploit. Security gaps between systems are as dangerous as the threats themselves.Modern cloud-native security demands a unified approach. Organizations need centralized platforms that pull real-time data from every endpoint, regardless of platform or location, and present it through a single management dashboard. This gives IT and security teams full, end-to-end visibility over threats, system health, and compliance posture. It also allows security policies to be deployed, updated, and enforced consistently across every environment, without relying on multiple, siloed tools.Unification strengthens security, simplifies operations, and dramatically reduces overhead, critical for scaling securely at cloud-native speeds. That’s why at Gcore, our integrated suite of products includes security for cloud, network, and AI workloads, all managed in a single, intuitive interface.Why choose Gcore for cloud-native security?Securing cloud-native workloads requires more than legacy firewalls and patchwork solutions. It demands dynamic, intelligent protection that moves as fast as your business does.Gcore Edge Security delivers robust, AI-driven security built for the cloud-native era. By combining real-time AI threat detection, zero-trust enforcement, automated responses, and compliance-first design, Gcore security solutions protect distributed applications without slowing down development cycles.Discover why WAAP is essential for cloud security in 2025

Announcing a new AI-optimized data center in Southern Europe

Good news for businesses operating in Southern Europe! Our newest cloud regions in Sines, Portugal, give you faster, more local access to the infrastructure you need to run advanced AI, ML, and HPC workloads across the Iberian Peninsula and wider region. Sines-2 marks the first region launched in partnership with Northern Data Group, signaling a new chapter in delivering powerful, workload-optimized infrastructure across Europe. And Sines-3 expands capacity and availability for the region.Strategically positioned in Portugal, Sines-2 and Sines-3 enhance coverage in Southern Europe, providing a lower-latency option for customers operating in or targeting this region. With the explosive growth of AI, machine learning, and compute-intensive workloads, these new regions are designed to meet escalating demand with cutting-edge GPU and storage capabilities.You can activate Sines-2 and Sines-3 for GPU Cloud or Everywhere Inference today with just a few clicks.Built for AI, designed to scaleSines-2 and Sines-3 bring with them next-generation infrastructure features, purpose-built for today's most demanding workloads:NVIDIA H100 GPUs: Unlock the full potential of AI/ML training, high-performance computing (HPC), and rendering workloads with access to H100 GPUs.VAST NFS (file sharing protocol) support: Benefit from scalable, high-throughput file storage ideal for data-intensive operations, research, and real-time AI workflows.IaaS portfolio: Deploy Virtual Machines, manage storage, and scale infrastructure with the same consistency and reliability as in our flagship regions.Organizations operating in Portugal, Spain, and nearby regions can now deploy workloads closer to end users, improving application performance. For finance, healthcare, public sector, and other organisations running sensitive workloads that must stay within a country or region, Sines-2 and Sines-3 are easy ways to access state-of-the-art GPUs with simplified compliance. Whether you're building AI models, running simulations, or managing rendering pipelines, Sines-2 and Sines-3 offer the performance, capacity, availability, and proximity you need.And best of all, servers are available and ready to deploy today.Run your AI workloads in Portugal todayWith these new Sines regions and our partnership with Northern Data Group, we're making it easier than ever for you to run AI workloads at scale. If you need speed, flexibility, and global reach, we're ready to power your next AI breakthrough.Unlock the power of Sines-2 and Sines-3 today

GTC Europe 2025: watch Seva Vayner on European AI trends

Inference is becoming Europe’s core AI workload. Telcos are moving fast on low-latency infrastructure. Data sovereignty is shaping every deployment decision.At GTC Europe, these trends were impossible to miss. The conversation has moved beyond experimentation to execution, with exciting, distinctly European priorities shaping conversations.Gcore’s own Seva Vayner, Product Director of Edge Cloud and AI, shared his take on this year’s event during GTC. He sees a clear shift in what European enterprises are asking for and what the ecosystem is ready to deliver.Scroll on to watch the interview and see where AI in Europe is heading.“It’s really a pleasure to see GTC in Europe”After years of global AI strategy being shaped primarily by the US and China, Europe is carving its own path. Seva notes that this year’s GTC Europe wasn’t just a regional spin-off. it marked the emergence of a distinctly European voice in AI development.“First of all, it's really a pleasure to see that GTC in Europe happened, and that a lot of European companies came together to have the conversation and build the ecosystem.”As Seva notes, the real excitement came from watching European players collaborate. The focus was less on following global trends and more on co-creating the region’s own AI trajectory.“Inference workloads will grow significantly in Europe”Inference was a throughline across nearly every session. As Seva points out, Europe is still at the early stages of adopting inference at scale, but the shift is happening fast.“Europe is only just starting its journey into inference, but we already see the trend. Over the next 5 to 10 years, inference workloads will grow significantly. That’s why GTC Europe is becoming a permanent, yearly event.”This growth won’t just be driven by startups. Enterprises, governments, and infrastructure providers are all waking up to the importance of real-time, regional inference capabilities.“There’s real traction. Companies are more and more interested in how to deliver low-latency inference. In a few years, this will be one of the most crucial workloads for any GPU cloud in Europe.”“Telcos are getting serious about AI”One of the clearest signs of maturity at GTC Europe was that telcos and CSPs are actively looking to deploy AI. And they’re asking the hard questions about how to integrate it into their infrastructure at a vast scale.“One of the most interesting things is how telcos are thinking about adopting AI workloads on their infrastructure to deliver low latency. Sovereignty is crucial, especially for customers looking to serve training or inference workloads inside their region. And also user experience: how can I get GPU capacity in clusters, or deliver inference in just a few clicks?”This theme—fast, sovereign, self-service AI—popped up again and again. Telcos and service providers want frictionless deployment and local control.“Companies are struggling most with data”While model deployment and infrastructure strategy took center stage, Seva reminds us that data processing and storage remains the bottleneck. Enterprises know they need to adopt AI, but they’re still navigating where and how to store and process the data that fuels it.“One of the biggest struggles for end customers is the data: where it’s processed, where it’s stored, and what kind of capabilities are available. From a European perspective, we already see more and more companies looking for sovereign data privacy and simple, mature solutions for end users.”That’s a familiar challenge for enterprises operating under GDPR, NIS2, and other compliance frameworks. The new wave of AI infrastructure has to be built for performance and for trust.AI in Europe: responsible, scalable, and localSeva’s key takeaway is that AI in Europe is no longer about catching up, it’s about doing it differently. The questions have changed from “Should we do AI?” to “How do we scale it responsibly, reliably, and locally?”From sovereign deployment to edge-first infrastructure, GTC Europe 2025 showed that inference is the foundation of how European businesses plan to run AI. “The ecosystem is coming together,” explains Seva. “And the next five years will be crucial for defining how AI will work: not just in the cloud, but everywhere.”If you’re looking to reduce latency, cut costs, and stay compliant while deploying AI in production, we invite you to download our free ebook, The inference optimization playbook.Download our free inference optimization playbook

Introducing FastEdge Triggers: real-time edge logic

When you're building real-time applications, whether for streaming platforms, SaaS dashboards, or security-sensitive services, you need content that adapts on the fly. Blocking suspicious IPs, injecting personalized content, transforming media on the edge—these should be fast, scalable, and reliable.Until now, they weren't.Developers and technical teams often had to work across multiple departments to create brittle, hardcoded solutions. Each use case, like watermarking video or rewriting headers, required a custom integration. There was no easy way to run logic dynamically at the edge. That changes with FastEdge Triggers.Real-time logic, built into the edgeFastEdge Triggers let you execute custom serverless logic at key moments in the HTTP lifecycle:on_request_headerson_request_bodyon_response_headerson_response_bodyFastEdge is built on the proxy-wasm standard, making it easy to adapt existing proxy-wasm applications (e.g., for Envoy or Kong) for use with Gcore. These trigger types align directly with proxy-wasm conventions, meaning less friction for developers familiar with modern proxy architectures.This means that you can now:Authenticate users' tokens, such as JWTBlock access by IP, region, or user agentInject CSS, HTML, or JavaScript into responsesTransform images or convert markdown to HTML before deliveryAdd security tokens or watermarks to video contentRewrite or sanitize request headers and bodiesNo backend round-trips. No manual routing. Just real-time, programmable edge behavior, backed by Gcore's global infrastructure.While FastEdge enables instant logic execution at the edge, response-stage triggers (on_response_headers and on_response_body) naturally depend on receiving data from the origin before acting. Even so, transformations happen at the edge, reducing backend load and improving overall efficiency.Our architecture means that FastEdge logic is executed in ultra-low-latency environments, tightly coupled with CDN. Triggers can be layered across multiple stages of a request without performance degradation.Built for developersFastEdge Triggers were built to solve three core pain points for technical teams:Hard to scale: Custom logic used to require bespoke, team-specific workaroundsHard to maintain: Even single-team solutions became brittle without proper edge infrastructureLimited flexibility: Legacy CDN logic couldn't support complex, dynamic behaviorWith FastEdge, developers have full control: no DevOps bottlenecks, no workarounds, no compromises. Logic runs at the edge, not your origin, minimizing backend exposure. FastEdge apps execute in isolated, sandboxed environments, reducing the risk of vulnerabilities that might otherwise be introduced when logic runs on central infrastructure.How it works behind the scenesEach FastEdge application is written in Rust or AssemblyScript and connected to the HTTP request lifecycle through Gcore's configuration interface. Apps are linked to trigger types through the CDN resource settings page in the Gcore Customer Portal.Configuring FastEdge Triggers from the CDN resource settings screen in the Gcore Customer PortalHere's what happens under the hood:You assign a FastEdge app to a trigger point.Our Core Proxy detects that trigger and automatically routes execution through your custom logic.The result is returned before hitting cache or origin, modified, enriched, and secured.This flow is deeply integrated with our CDN, delivering minimal latency with zero friction.A sequence diagram showing how FastEdge Triggers works under the hood A real-life use case: markdown to HTML at the edgeHere's a real-world example that shows how FastEdge Triggers can power multi-step content transformation without a single backend server.One customer wanted to serve Markdown-based documentation as styled HTML, without spinning up infrastructure. Using this FastEdge app written in Rust, they achieved just that.The app listens at three trigger points: on_request_headers, on_response_headers, and on_response_bodyIt detects requests for .md files and converts them on the flyThe HTML is served directly via CDN, no origin compute requiredYou can see it live here:README renderedTerraform docs renderedThis use case showcases FastEdge's ability to orchestrate multi-stage logic at the edge: ideal for serverless documentation, lightweight rendering, or content transformation pipelines.Ready to build smarter at the edge?FastEdge Triggers are available now for all FastEdge customers. If you're looking to modernize your edge logic, simplify architecture, and ship faster with fewer backend dependencies, FastEdge is built for you.Reach out to your account manager or contact us to activate FastEdge Triggers in your environment.Try Fastedge Triggers

Gcore and Orange Business launch innovation program piloting joint solution to deliver sovereign inference as a service

Gcore and Orange Business have kicked off a strategic co-innovation program with the mission to deliver a scalable, production-grade AI inference service that is sovereign by design. By combining Orange Business’ secure, trusted cloud infrastructure and Gcore’s AI inference private deployment service, the collaboration empowers European enterprises and public sector organizations to run inference workloads at scale, without compromising on latency, control, or compliance.Gcore’s AI inference private deployment service is already live on Orange Business’ Cloud Avenue infrastructure. Selected enterprises across industries are actively testing it in real-world scenarios. These pilot customers are exploring how fast, secure, and compliant inference can accelerate their AI projects, cut deployment times, and reduce infrastructure overhead.The prototype will be demonstrated at NVIDIA GTC Paris, at the Taiga Cloud booth G26. Stop by any time to see it in action.The inference supercycle is underwayBy 2030, inference will comprise 70% of enterprise AI workloads. Telcos are well positioned to lead this shift due to their dense edge presence, licensed national data infrastructure, and long-standing trust relationships.Gcore’s inference solution provides a sovereign, edge-native inference layer. It enables users to serve real-time, GPU-intensive applications like agentic AI, trusted LLMs, computer vision, and predictive analytics, all while staying compliant with Europe’s evolving data and AI governance frameworks.From complexity to three clicksEnterprise AI doesn’t need to be hard. Deploying inference workloads at scale used to demand Kubernetes fluency, large MLOps teams, and costly trial-and-error.Now? It’s just three clicks:Pick a model: Choose from NVIDIA NIMs, open source, or proprietary libraries.Choose a region: Select one of Orange Business’ accredited EU data centers.Deploy: See your workloads go live in under 10 seconds.Enterprises can launch inference projects faster, test ideas more quickly, and deliver production-ready AI services without spending months on ML plumbing.Explore our blog to watch a demo showing how enterprises can deploy inference workloads in just three clicks and ten seconds.Sovereign by designAll model data, logs, and inference results are stored exclusively within Orange Business’ own data centers in France, Germany, Norway, and Sweden. Cross-border data transfer is opt-in only, helping ensure alignment with GDPR, sector-specific regulations, and the forthcoming EU AI Act.This platform is built for trust, transparency, and sovereignty by default. Customers maintain full control over their data, with governance baked into every layer of the deployment.Performance without trade-offsGcore’s AI inference solution avoids the latency spikes, cold starts, and resource waste common in traditional cloud AI setups. Key design features include:Smart GPU routing: Directs each request to the nearest in-region GPU, delivering real-time performance with sub-50ms latency.Pre-loaded models: Reduces cold start delays and improves response times.Secure multi-tenancy: Isolates customer data while maximizing infrastructure efficiency.The result is a production-ready inference platform optimized for both performance and compliance.Powering the future of AI infrastructureThis partnership marks a step forward for Europe’s sovereign AI capabilities. It highlights how telcos can serve as the backbone of next-generation AI infrastructure, hosting, scaling, and securing workloads at the edge.With hundreds of edge POPs, trusted national networks, and deep ties across vertical industries, Orange Business is uniquely positioned to support a broad range of use cases, including real-time customer service AI, fraud detection, healthcare diagnostics, logistics automation, and public sector digital services.What’s next: validating real-world performanceThis phase of the Gcore and Orange Business program is focused on validating the solution through live customer deployments and performance benchmarks. Orange Business will gather feedback from early access customers to shape its future sovereign inference service offering. These insights will drive refinements and shape the roadmap ahead of a full commercial launch planned for later this year.Gcore and Orange Business are committed to delivering a sovereign inference service that meets Europe’s highest standards for speed, simplicity, and trust. This co-innovation program lays the foundation for that future.Ready to discover how Gcore and Orange Business can deliver sovereign inference as a service for your business?Request a preview