Gcore named a Leader in the GigaOm Radar for AI Infrastructure!Get the report
  1. Home
  2. Blog
  3. Managed Kubernetes with GPU Worker Nodes for Faster AI/ML Inference

Managed Kubernetes with GPU Worker Nodes for Faster AI/ML Inference

  • By Gcore
  • November 23, 2023
  • 6 min read
Managed Kubernetes with GPU Worker Nodes for Faster AI/ML Inference

Currently, 48% of organizations use Kubernetes for AI/ML workloads, and the demand for such workloads also drives usage patterns on Kubernetes. Let’s look at the key technical reasons behind this trend, how AI/ML workloads benefit from running on GPU worker nodes in managed K8s clusters, and some considerations regarding GPU vendors and scheduling.

Why Kubernetes is Good for AI/ML

A number of features make Kubernetes popular and effective in the AI/ML realm:

  • Scalability. K8s enables seamless, on-demand scalability of AI/ML workloads. This is especially critical for inference workloads because they are more dynamic regarding resource utilization than training workloads, and can be resource-intensive. The latter means they often require frequent scaling up or down based on the volume of data being processed.
  • Automated scheduling. The ability to automatically schedule AI/ML workloads reduces the operational overhead for MLOps teams. It also improves the performance of AI/ML applications by ensuring they are scheduled to the nodes that have the required resources.
  • Resource utilization. K8s can help to optimize physical resource utilization for AI/ML workloads. It can dynamically and automatically allocate the required amounts of CPU, GPU, and RAM resources. This is critical due to the resource-intensive nature of these workloads and the potential for cost reduction.
  • Flexibility. With K8s, you can deploy AI/ML workloads across multiple infrastructures, including on-premises, public cloud, and edge cloud. This feature also makes Kubernetes a good option for organizations that need to deploy AI/ML workloads in hybrid or multicloud environments.
  • Portability. You can easily migrate Kubernetes-based AI/ML applications between different environments and installations. This is critical for deploying and managing AI/ML workloads in a hybrid infrastructure.

Use Cases

Here are some examples of how companies have adopted Kubernetes (K8s) for their AI/ML projects:

  • OpenAI has been an early adopter of K8s. In 2017, the company was running machine learning experiments on K8s clusters. With the K8s autoscaler, OpenAI could deploy such a project in a few days and scale it up to hundreds of GPUs in a week or two. Without the Kubernetes autoscaler, such a process would take months. As a result, OpenAI increased the number of AI experiments tenfold. In 2021, the company expanded its K8s infrastructure to 7,500 nodes for large ML models such as GPT-3, DALL-E and CLIP.
  • Shell uses a K8s-based platform Kubeflow to run tests and quickly experiment with ML models on laptops. Engineers can move these workloads from the test environment to production, and the workloads will function just the same. With Kubernetes, Shell builds thousands of ML models in two hours instead of a month. The time to write the underlying code is reduced from two weeks to four hours.
  • IKEA has developed an internal MLOps platform based on K8s to train ML models on-premises and get inference in the cloud. This allows the MLOps team to orchestrate different types of trained models and, ultimately, improve the customer experience.

Of course, these examples are not broadly representative. Most companies are not fully AI-focused like OpenAI and are not as large as IKEA. They can’t afford to train large AI/ML models from scratch, which takes time and money, but instead run pretrained models and integrate them with other internal services. In other words, these companies use AI/ML inference, not training.

Inference workloads tend to be more dynamic regarding resource utilization than training workloads because production clusters are more likely to experience user and traffic spikes. In such cases, the infrastructure needs to scale up and down quickly, whereas AI/ML training typically requires gradual scaling. Therefore, for AI/ML models that are already trained and deployed, the scalability and dynamic resource utilization of K8s are especially beneficial.

Why GPU Is Better than CPU for Worker Nodes

GPU worker nodes are a better fit for containerized AI/ML workloads than CPU worker nodes for the same reasons as for non-containerized workloads: GPU offers parallel processing capabilities and higher performance for AI/ML than CPUs.

Inference for AI/ML workloads running on GPU worker nodes can be faster than those running on CPU worker nodes due to the following factors:

  • The GPU’s memory architecture is specifically optimized for AI/ML processing, enabling higher memory bandwidth than CPUs.
  • GPUs often provide better computational performance than CPUs for AI/ML training and inference because they have more transistors to process data.

Kubernetes adds its own performance benefits to those of GPUs. In addition to hardware acceleration, AI/ML workloads running on GPU worker nodes get scalability and dynamic resource allocation. Kubernetes also includes plugins for GPU vendor support, making it easy to configure GPU resources for use by AI/ML workloads.

Figure 1. The simplified K8s cluster architecture with GPU worker node

With Kubernetes, you can manage GPU resources across multiple worker nodes. Containers consume GPU resources in essentially the same way as they consume CPU resources.

GPU Vendors Comparison

There are three GPU vendors available for Kubernetes: NVIDIA, AMD, and Intel. When choosing GPU vendors for worker nodes, it’s important to keep in mind that their compatibility with Kubernetes, tool ecosystem, performance, and cost can vary.

 NVIDIA GPU worker nodesAMD GPU worker nodesIntel GPU worker nodes
Compatibility with K8sExcellentGoodGood
Tools ecosystemExcellentGoodFair
PerformanceExcellentGoodFair
CostHighMediumMedium

Let’s compare the three vendors.

  • Compatibility with Kubernetes: NVIDIA is the most compatible with K8s. The company provides CUDA drivers, various container runtimes, and other tools and features that simplify GPU integration and management. AMD and Intel support for K8s is less mature and often requires custom configuration.
  • Tools ecosystem: NVIDIA has the best ecosystem of tools for AI/ML, thanks to software such as the GPU Operator and Container Toolkit, and ML frameworks adapted for NVIDIA GPUs, such as TensorFlow, PyTorch, and MXNet. AMD and Intel also have tools for AI/ML, but they are not as comprehensive as NVIDIA’s.
  • Performance: NVIDIA GPUs are known for their high performance on AI workloads, outperforming the competition on most MLPerf benchmarks. NVIDIA GPUs are ideal for demanding tasks such as deep learning and high-performance computing.
  • Cost: NVIDIA GPUs are the most expensive type of GPU worker node.
  • Flexibility: NVIDIA offers several features that make its GPU-based K8s clusters highly flexible in terms of management and resource utilization compared to its competitors:
    • Multi-instance GPU (MIG) mechanism for NVIDIA A100 GPU to allow a GPU to be securely partitioned into up to seven separate instances for better GPU utilization
    • Multicloud GPU clusters, which can be seamlessly managed and scaled as if deployed in a single cloud
    • Heterogeneous GPU and CPU clusters to simplify the training and management of distributed deep learning models
    • GPU metrics monitoring with Prometheus and visualization with Grafana
    • Support for multiple container runtimes, including Docker, CRI-O, and containers

In summary, NVIDIA GPU worker nodes are the best choice for AI/ML workloads in Kubernetes. They offer the best compatibility with K8s, the best tools ecosystem, and the best performance. That’s why we chose NVIDIA GPUs for Gcore Managed Kubernetes. Our customers get all the benefits of NVIDIA, including the highest performance level for faster training and inference of their AI/ML workloads.

Important Specifics of GPU Scheduling in Kubernetes

To enable GPU scheduling and allow pods to access its resources, you need to install a vendor-specific device plugin from your chosen GPU vendor — NVIDIA, AMD, or Intel.

Pods request GPU resources in the same way they request CPU resources. However, Kubernetes is less flexible with GPU than with CPU when it comes to configuring `limits` and `requests`. With `requests`, you set the amount of resources that a pod is guaranteed to get, such as a minimum quantity. With `limits`, you set the amount of resources that won’t be exceeded, for instance, a maximum quantity. When configuring a pod manifest for GPU requests, `limits` and `requests` should be equal, meaning that a pod won’t get more resources than guaranteed if, for example, the application needs them.

Also, by default, you can’t allocate part of a GPU or multiple GPUs to a container because of the way CPU allocation works. You can only allocate one full GPU per container. This limitation doesn’t help with resource economics. But NVIDIA has managed to overcome this. With its GPU, you can use either use:

  • Time-sharing GPUs, which work by sequentially assigning time intervals to shared containers on a physical GPU. This works for all NVIDIA GPUs.
  • Multi-instance GPUs, which allow a GPU to be divided into up to seven instances for better GPU utilization. This only works with the NVIDIA A100 GPU.

These two features help you to use NVIDIA GPU resources more efficiently and save money on renting GPU instances in the cloud. This is also a significant advantage over other GPU vendors.

Managed Kubernetes vs. Vanilla Kubernetes with GPU

A managed Kubernetes service can offer several advantages over vanilla (open source) Kubernetes for AI/ML workloads running on GPU worker nodes:

  • Flexible choice of GPUs. Managed K8s services typically provide support for GPU instances with various specifications. This makes it easier to choose the appropriate level of GPU acceleration for your AI/ML workloads.
  • Reduced operational overhead. Managed Kubernetes handles the everyday responsibilities of overseeing a Kubernetes cluster, like managing the control plane and implementing K8s updates. This enables you to focus on creating, deploying and managing AI/ML applications.
  • Scalability and reliability. Managed K8s services are typically designed with a strong focus on scalability and reliability, ensuring that your AI/ML workloads can adeptly handle fluctuating traffic and spikes in resource demand.

Gcore Managed Kubernetes with NVIDIA GPU Workers

Gcore Managed Kubernetes helps you to deploy Kubernetes clusters fast, without the need to maintain the underlying infrastructure and Kubernetes backend. The Gcore team controls the master nodes while you control only the worker nodes, reducing your operational burden. Worker nodes can be Gcore Virtual Machines or Bare Metal servers in various configurations, including those with NVIDIA GPU modules.

Conclusion

Managed Kubernetes with GPU worker nodes is a powerful and flexible combination for accelerating AI/ML inference. By taking advantage of both Kubernetes and GPUs, managed Kubernetes with GPU worker nodes can help you improve the performance and efficiency of your AI/ML workloads. The service also frees you from the need to maintain the underlying GPU infrastructure and most Kubernetes components.

Gcore Managed Kubernetes can boost your AI/ML workloads with GPU worker nodes on Bare Metal for faster inference and operational efficiency. We offer a 99.9% SLA with free production management and free egress traffic—at outstanding value for money.

Explore Managed Kubernetes

Related articles

From budget strain to AI gain: Watch how studios are building smarter with AI

Game development is in a pressure cooker. Budgets are ballooning, infrastructure and labor costs are rising, and players expect more complexity and polish with every release. All studios, from the major AAAs to smaller indies, are feeling the strain.But there is a way forward. In a recent webinar, Sean Hammond, Territory Manager for the UK and Nordics at Gcore, explained how AI is reshaping game development workflows and how the right infrastructure strategy can reduce costs, speed up production, and create better player experiences.Scroll on to watch key moments from Sean's talk and explore how studios can make AI work for them.Rising costs are threatening game developmentGame revenue has slowed, but development costs continue to rise. Some AAA titles now surpass $100 million in development budgets. The complexity of modern games demands more powerful servers, scalable infrastructure, and larger teams, making the industry increasingly unsustainable.Personnel and infrastructure costs are also climbing. Developers, artists, and QA testers with specialized skills are in high demand, as are technologies like VR, AR, and AI. Studios are also having to invest more in cybersecurity to protect player data, detect cheating, and safeguard in-game economies.AI is revolutionizing GameDev, even without a perfect use caseWhile the perfect use case for AI in gaming may not have been found yet, it’s already transforming how games are built, tested, and personalized.Sean highlighted emerging applications, including:Smarter QA testingAI-driven player personalizationReal-time motion and animationAccelerated environment and character designMultilingual localizationAdaptive game balancingStudios are already applying these technologies to reduce production timelines and improve immersion.The challenge of secure, scalable AI adoptionOf course, AI adoption doesn’t come without its challenges. Chief among them is security. Public models pose risks: no studio wants their proprietary assets to end up training a competitor’s model.The solution? Deploy AI models on infrastructure you trust so you’re in complete control. That’s where Gcore comes in.Gcore Everywhere Inference reduces compute costs and infrastructure bloat by allowing you to deploy only what you need, where you need it.The future of gaming is AI at scaleTo power real-time player experiences, your studio needs to deploy AI globally, close to your users.Gcore Everywhere Inference lets you deploy models worldwide at the edge with minimal latency because data is not routed back to central servers. This means fast, responsive gameplay and a new generation of real-time, AI-driven features.As a company originally built by gamers, we’ve developed AI solutions with gaming studios in mind. Here’s what we offer:Global edge inference for real-time gameplay: Deploy your AI models close to players worldwide, enabling fast, responsive player experiences without routing data to central servers.Full control over AI model deployment and IP protection: Avoid public APIs and retain full ownership of your assets with on-prem options, preventing your proprietary data from being available to competitors.Scalable, cost-efficient infrastructure tailored to gaming workloads: Deploy only what you need to avoid overprovisioning and reduce compute costs without sacrificing performance.Enhanced player retention through AI-driven personalization and matchmaking: Real-time inference powers smarter NPCs and dynamic matchmaking, improving engagement and keeping players coming back for more.Deploy models in 3 clicks and under 10 seconds: Our developer-friendly platform lets you go from trained model to global deployment in seconds. No complex DevOps setup required.Final takeawayAI is advancing game development fast, but only if it’s deployed right. Gcore offers scalable, secure, and cost-efficient AI infrastructure that helps studios create smarter, faster, and more immersive games.Want to see how it works? Deploy your first model in just a few clicks.Check out our blog on how AI is transforming gaming in 2025

No capacity = no defense: rethinking DDoS resilience at scale

DDoS attacks are growing so massive they are overwhelming the very infrastructure designed to stop them. Earlier this year, a peak attack exceeding 7 Tbps was recorded, while 1–2 Tbps attacks have become everyday occurrences. Such volumes were unimaginable just a few years ago.Yet many businesses still depend on mitigation systems that were not designed to scale alongside this rapid attack growth. While these systems may have smart detection, that advantage is moot if physical infrastructure cannot handle the load. Today, raw capacity is non-negotiable — intelligent filtering alone isn’t enough; you need vast, globally distributed throughput.Lukasz Karwacki, Gcore’s Security Solution Architect specializing in DDoS, explains why modern DDoS protection requires immense capacity, global distribution, and resilient routing. Scroll down to watch him describe why a globally distributed defense model is now the minimum standard for mitigating devastating DDoS attacks.DDoS is a capacity war, not just a traffic spikeThe central challenge in DDoS mitigation today is the total attack volume versus total available throughput.Attacks do not originate from a single location. Global botnets harness compromised devices across Asia, Africa, Europe, and the Americas. When all this traffic converges on a single data center, it creates a structural mismatch: a single site’s limited capacity pitted against the full bandwidth of the internet.Anycast is non-negotiable for global capacityTo counter today’s attack volumes, mitigation capacity must be distributed globally, and that’s where Anycast routing plays a critical role.Anycast routes incoming traffic to the nearest available scrubbing center. If one region is overwhelmed or offline, traffic is automatically redirected elsewhere. This eliminates single points of failure and enables the absorption of massive attacks without compromising service availability.By contrast, static mitigation pipelines create bottlenecks: all traffic funnels through a single point, making it easy for attackers to overwhelm that location. Centralized mitigation means centralized failure. The more distributed your infrastructure, the harder it is to take down — that’s resilient network design.Why always-on cloud defense outperforms on-demand protectionSome DDoS defenses activate only when an attack is detected. These on-demand models may save costs but introduce a brief delay while traffic is rerouted and protections come online.Even a few seconds of delay can allow a high-speed attack to inflict damage.Gcore’s cloud-native DDoS protection is always-on, continuously monitoring, filtering, and balancing traffic across all scrubbing centers. This means no activation lag and no dependency on manual triggers.Capacity is the new baseline for protectionModern DDoS attacks focus less on sophistication and more on sheer scale. Attackers simply overwhelm infrastructure by flooding it with more traffic than it can handle.True DDoS protection begins with capacity planning — not just signatures or rulesets. You need sufficient bandwidth, processing power, and geographic distribution to absorb attacks before they reach your core systems.At Gcore, we’ve built a globally distributed DDoS mitigation network with over 200 Tbps capacity, 40+ protected data centers, and thousands of peering partners. Using Anycast routing and always-on defense, our infrastructure withstands attacks that other systems simply can’t.Many customers turn to Gcore for DDoS protection after other providers fail to keep up with attack capacity.Find out why Fawkes Games turned to Gcore for DDoS protection

How AI-enhanced content moderation is powering safe and compliant streaming

How AI-enhanced content moderation is powering safe and compliant streaming

As streaming experiences a global boom across platforms, regions, and industries, providers face a growing challenge: how to deliver safe, respectful, and compliant content delivery at scale. Viewer expectations have never been higher, likewise the regulatory demands and reputational risks.Live content in particular leaves little room for error. A single offensive comment, inappropriate image, or misinformation segment can cause long-term damage in seconds.Moderation has always been part of the streaming conversation, but tools and strategies are evolving rapidly. AI-powered content moderation is helping providers meet their safety obligations while preserving viewer experience and platform performance.In this article, we explore how AI content moderation works, where it delivers value, and why streaming platforms are adopting it to stay ahead of both audience expectations and regulatory pressures.Real-time problems require real-time solutionsHuman moderators can provide accuracy and context, but they can’t match the scale or speed of modern streaming environments. Live streams often involve thousands of viewers interacting at once, with content being generated every second through audio, video, chat, or on-screen graphics.Manual review systems struggle to keep up with this pace. In some cases, content can go viral before it is flagged, like deepfakes that circulated on Facebook leading up to the 2025 Canadian election. In others, delays in moderation result in regulatory penalties or customer churn, like X’s 2025 fine under the EU Digital Services Act for shortcomings in content moderation and algorithm transparency. This has created a demand for scalable solutions that act instantly, with minimal human intervention.AI-enhanced content moderation platforms address this gap. These systems are trained to identify and filter harmful or non-compliant material as it is being streamed or uploaded. They operate across multiple modalities—video frames, audio tracks, text inputs—and can flag or remove content within milliseconds of detection. The result is a safer environment for end users.How AI moderation systems workModern AI moderation platforms are powered by machine learning algorithms trained on extensive datasets. These datasets include a wide variety of content types, languages, accents, dialects, and contexts. By analyzing this data, the system learns to identify content that violates platform policies or legal regulations.The process typically involves three stages:Input capture: The system monitors live or uploaded content across audio, video, and text layers.Pattern recognition: It uses models to identify offensive content, including nudity, violence, hate speech, misinformation, or abusive language.Contextual decision-making: Based on confidence thresholds and platform rules, the system flags, blocks, or escalates the content for review.This process is continuous and self-improving. As the system receives more inputs and feedback, it adapts to new forms of expression, regional trends, and platform-specific norms.What makes this especially valuable for streaming platforms is its low latency. Content can be flagged and removed in real time, often before viewers even notice. This is critical in high-stakes environments like esports, corporate webinars, or public broadcasts.Multi-language moderation and global streamingStreaming audiences today are truly global. Content crosses borders faster than ever, but moderation standards and cultural norms do not. What’s considered acceptable in one region may be flagged as offensive in another. A word that is considered inappropriate in one language might be completely neutral in another. A piece of nudity in an educational context may be acceptable, while the same image in another setting may not be. Without the ability to understand nuance, AI systems risk either over-filtering or letting harmful content through.That’s why high-quality moderation platforms are designed to incorporate context into their models. This includes:Understanding tone, not just keywordsRecognizing culturally specific gestures or idiomsAdapting to evolving slang or coded languageApplying different standards depending on content type or target audienceThis enables more accurate detection of harmful material and avoids false positives caused by mistranslation.Training AI models for multi-language support involves:Gathering large, representative datasets in each languageTeaching the model to detect content-specific risks (e.g., slurs or threats) in the right cultural contextContinuously updating the model as language evolvesThis capability is especially important for platforms that operate in multiple markets or support user-generated content. It enables a more respectful experience for global audiences while providing consistent enforcement of safety standards.Use cases across the streaming ecosystemAI moderation isn’t just a concern for social platforms. It plays a growing role in nearly every streaming vertical, including the following:Live sports: Real-time content scanning helps block offensive chants, gestures, or pitch-side incidents before they reach a wide audience. Fast filtering protects the viewer experience and helps meet broadcast standards.Esports: With millions of viewers and high emotional stakes, esports platforms rely on AI to remove hate speech and adult content from chat, visuals, and commentary. This creates a more inclusive environment for fans and sponsors alike.Corporate live events: From earnings calls to virtual town halls, organizations use AI moderation to help ensure compliance with internal communication guidelines and protect their reputation.Online learning: EdTech platforms use AI to keep classrooms safe and focused. Moderation helps filter distractions, harassment, and inappropriate material in both live and recorded sessions.On-demand entertainment: Even outside of live broadcasts, moderation helps streaming providers meet content standards and licensing obligations across global markets. It also ensures user-submitted content (like comments or video uploads) meets platform guidelines.In each case, the shared goal is to provide a safe and trusted streaming environment for users, advertisers, and creators.Balancing automation with human oversightAI moderation is a powerful tool, but it shouldn’t be the only one. The best systems combine automation with clear review workflows, configurable thresholds, and human input.False positives and edge cases are inevitable. Giving moderators the ability to review, override, or explain decisions is important for both quality control and user trust.Likewise, giving users a way to appeal moderation decisions or report issues ensures that moderation doesn’t become a black box. Transparency and user empowerment are increasingly seen as part of good platform governance.Looking ahead: what’s next for AI moderationAs streaming becomes more interactive and immersive, moderation will need to evolve. AI systems will be expected to handle not only traditional video and chat, but also spatial audio, avatars, and real-time user inputs in virtual environments.We can also expect increased demand for:Personalization, where viewers can set their own content preferencesIntegration with platform APIs for programmatic content governanceCross-platform consistency to support syndicated content across partnersAs these changes unfold, AI moderation will remain central to the success of modern streaming. Platforms that adopt scalable, adaptive moderation systems now will be better positioned to meet the next generation of content challenges without compromising on speed, safety, or user experience.Keep your streaming content safe and compliant with GcoreGcore Video Streaming offers AI Content Moderation that satisfies today’s digital safety concerns while streamlining the human moderation process.To explore how Gcore AI Content Moderation can transform your digital platform, we invite you to contact our streaming team for a demonstration. Our docs provide guidance for using our intuitive Gcore Customer Portal to manage your streaming content. We also provide a clear pricing comparison so you can assess the value for yourself.Embrace the future of content moderation and deliver a safer, more compliant digital space for all your users.Try AI Content Moderation for freeTry AI Content Moderation for free

Deploy GPT-OSS-120B privately on Gcore

OpenAI’s release of GPT-OSS-120B is a turning point for LLM developers. It’s a 120B parameter model trained from scratch, licensed for commercial use, and available with open weights. This is a serious asset for serious builders.Gcore now supports private GPT-OSS-120B deployments via our Everywhere Inference platform. That means you can stand up your own endpoint in minutes, run inference at scale, and control the full stack, without API limits, vendor lock-in, or hidden usage fees. Just fast, secure, controlled deployment on your terms. Deploy now in three clicks or read on to learn more.Why GPT-OSS-120B is big news for buildersThis model changes the game for anyone developing AI apps, platforms, or infrastructure. It brings GPT-3-level reasoning to the open-source ecosystem and frees developers from closed APIs.With GPT-OSS-120B, you get:Full access to model weights and architectureSelf-hosting for maximum data control and privacySupport for fine-tuning and model editingOffline deployment for secure or air-gapped useMassive cost savings at scaleYou can deploy in any Gcore region (or leverage Gcore’s three-click serverless inference on your own infrastructure), route traffic through your own stack, and fully control load, latency, and logs. This is LLM deployment for real-world apps, not just playground prompts.How to deploy GPT-OSS-120B with Gcore Everywhere InferenceGcore Everywhere Inference gives you a clean path from open model to production endpoint. You can spin up a dedicated deployment in just three clicks. We offer configuration options to suit your business needs:Choose your location (cloud or on-prem)Integrate via standard APIs (OpenAI-compatible)Control usage, autoscale, and costsDeploying GPT-OSS-120B on Gcore takes just three clicks in the Gcore Customer Portal.There are no shared endpoints. You get dedicated compute, low-latency routing, and full control and observability.You can also bring your own trained variant if you’ve fine-tuned GPT-OSS-120B elsewhere. We’ll help you host it reliably, close to your users.Use cases: where GPT-OSS-120B fits bestCommercial GPTs still outperform OSS models on some general tasks, but GPT-OSS-120B gives you control, portability, and flexibility where it counts. Most importantly, it gives you the ability to build privacy-sensitive applications.Great fits include:Internal dev tools and copilotsRetrieval-augmented generation (RAG) pipelinesSecure, private enterprise assistantsData-sensitive, on-prem AI workloadsModels requiring full customization or fine-tuningIt’s especially relevant for finance, healthcare, government, and legal teams operating under strict compliance rules.Deploy GPT-OSS-120B todayWant to learn more about GPT-OSS-120B and why Gcore is an ideal provider for deployment? Get all the information you need on our dedicated page.And if you’re ready to deploy in just three clicks, head on over to the Gcore Customer Portal. GPT-OSS-120B is waiting for you in the Application Catalog.Learn more about deploying GPT-OSS-120B on Gcore

Announcing new tools, apps, and regions for your real-world AI use cases

Three updates, one shared goal: helping builders move faster with AI. Our latest releases for Gcore Edge AI bring real-world AI deployments within reach, whether you’re a developer integrating genAI into a workflow, an MLOps team scaling inference workloads, or a business that simply needs access to performant GPUs in the UK.MCP: make AI do moreGcore’s MCP server implementation is now live on GitHub. The Model Context Protocol (MCP) is an open standard, originally developed by Anthropic, that turns AI models into agents that can carry out real-world tasks. It allows you to plug genAI models into everyday tools like Slack, email, Jira, and databases, so your genAI can read, write, and reason directly across systems. Think of it as a way to turn “give me a summary” into “send that summary to the right person and log the action.”“AI needs to be useful, not just impressive. MCP is a critical step toward building AI systems that drive desirable business outcomes, like automating workflows, integrating with enterprise tools, and operating reliably at scale. At Gcore, we’re focused on delivering that kind of production-grade AI through developer-friendly services and top-of-the-range infrastructure that make real-world deployment fast and easy.” — Seva Vayner, Product Director of Edge Cloud and AI, GcoreTo get started, clone the repo, explore the toolsets, and test your own automations.Gcore Application Catalog: inference without overheadWe’ve upgraded the Gcore Model Catalog into something even more powerful: an Application Catalog for AI inference. You can still deploy the latest open models with three clicks. But now, you can also tune, share, and scale them like real applications.We’ve re-architected our inference solution so you can:Run prefill and decode stages in parallelShare KV cache across pods (it’s not tied to individual GPUs) from August 2025Toggle WebUI and secure API independently from August 2025These changes cut down on GPU memory usage, make deployments more flexible, and reduce time to first token, especially at scale. And because everything is application-based, you’ll soon be able to optimize for specific business goals like cost, latency, or throughput.Here’s who benefits:ML engineers can deploy high-throughput workloads without worrying about memory overheadBackend developers get a secure API, no infra setup neededProduct teams can launch demos instantly with the WebUI toggleInnovation labs can move from prototype to production without reconfiguringPlatform engineers get centralized caching and predictable scalingThe new Application Catalog is available now through the Gcore Customer Portal.Chester data center: NVIDIA H200 capacity in the UKGcore’s newest AI cloud region is now live in Chester, UK. This marks our first UK location in partnership with Northern Data. Chester offers 2000 NVIDIA H200 GPUs with BlueField-3 DPUs for secure, high-throughput compute on Gcore GPU Cloud, serving your training and inference workloads. You can reserve your H200 GPU immediately via the Gcore Customer Portal.This launch solves a growing problem: UK-based companies building with AI often face regional capacity shortages, long wait times, or poor performance when routing inference to overseas data centers. Chester fixes that with immediate availability on performant GPUs.Whether you’re training LLMs or deploying inference for UK and European users, Chester offers local capacity, low latency, and impressive capacity and availability.Next stepsExplore the MCP server and start building agentic workflowsTry the new Application Catalog via the Gcore Customer PortalDeploy your workloads in Chester for high-performance UK-based computeDeploy your AI workload in three clicks today!

Gcore recognized as a Leader in the 2025 GigaOm Radar for AI Infrastructure

Gcore recognized as a Leader in the 2025 GigaOm Radar for AI Infrastructure

We’re proud to share that Gcore has been named a Leader in the 2025 GigaOm Radar for AI Infrastructure—the only European provider to earn a top-tier spot. GigaOm’s rigorous evaluation highlights our leadership in platform capability and innovation, and our expertise in delivering secure, scalable AI infrastructure.Inside the GigaOm Radar: what’s behind the Leader statusThe GigaOm Radar report is a respected industry analysis that evaluates top vendors in critical technology spaces. In this year’s edition, GigaOm assessed 14 of the world’s leading AI infrastructure providers, measuring their strengths across key technical and business metrics. It ranks providers based on factors such as scalability and performance, deployment flexibility, security and compliance, and interoperability.Alongside the ranking, the report offers valuable insights into the evolving AI infrastructure landscape, including the rise of hybrid AI architectures, advances in accelerated computing, and the increasing adoption of edge deployment to bring AI closer to where data is generated. It also offers strategic takeaways for organizations seeking to build scalable, secure, and sovereign AI capabilities.Why was Gcore named a top provider?The specific areas in which Gcore stood out and earned its Leader status are as follows:A comprehensive AI platform offering Everywhere Inference and GPU Cloud solutions that support scalable AI from model development to productionHigh performance powered by state-of-the-art NVIDIA A100, H100, H200 and GB200 GPUs and a global private network ensuring ultra-low latencyAn extensive model catalogue with flexible deployment options across cloud, on-premises, hybrid, and edge environments, enabling tailored global AI solutionsExtensive capacity of cutting-edge GPUs and technical support in Europe, supporting European sovereign AI initiativesChoosing Gcore AI is a strategic move for organizations prioritizing ultra-low latency, high performance, and flexible deployment options across cloud, on-premises, hybrid, and edge environments. Gcore’s global private network ensures low-latency processing for real-time AI applications, which is a key advantage for businesses with a global footprint.GigaOm Radar, 2025Discover more about the AI infrastructure landscapeAt Gcore, we’re dedicated to driving innovation in AI infrastructure. GPU Cloud and Everywhere Inference empower organizations to deploy AI efficiently and securely, on their terms.If you’re planning your AI infrastructure roadmap or rethinking your current one, this report is a must-read. Explore the report to discover how Gcore can support high-performance AI at scale and help you stay ahead in an AI-driven world.Download the full report

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.