Gcore named a Leader in the GigaOm Radar for AI Infrastructure!Get the report
  1. Home
  2. Blog
  3. Managed Kubernetes with GPU Worker Nodes for Faster AI/ML Inference

Managed Kubernetes with GPU Worker Nodes for Faster AI/ML Inference

  • By Gcore
  • November 23, 2023
  • 6 min read
Managed Kubernetes with GPU Worker Nodes for Faster AI/ML Inference

Currently, 48% of organizations use Kubernetes for AI/ML workloads, and the demand for such workloads also drives usage patterns on Kubernetes. Let’s look at the key technical reasons behind this trend, how AI/ML workloads benefit from running on GPU worker nodes in managed K8s clusters, and some considerations regarding GPU vendors and scheduling.

Why Kubernetes is Good for AI/ML

A number of features make Kubernetes popular and effective in the AI/ML realm:

  • Scalability. K8s enables seamless, on-demand scalability of AI/ML workloads. This is especially critical for inference workloads because they are more dynamic regarding resource utilization than training workloads, and can be resource-intensive. The latter means they often require frequent scaling up or down based on the volume of data being processed.
  • Automated scheduling. The ability to automatically schedule AI/ML workloads reduces the operational overhead for MLOps teams. It also improves the performance of AI/ML applications by ensuring they are scheduled to the nodes that have the required resources.
  • Resource utilization. K8s can help to optimize physical resource utilization for AI/ML workloads. It can dynamically and automatically allocate the required amounts of CPU, GPU, and RAM resources. This is critical due to the resource-intensive nature of these workloads and the potential for cost reduction.
  • Flexibility. With K8s, you can deploy AI/ML workloads across multiple infrastructures, including on-premises, public cloud, and edge cloud. This feature also makes Kubernetes a good option for organizations that need to deploy AI/ML workloads in hybrid or multicloud environments.
  • Portability. You can easily migrate Kubernetes-based AI/ML applications between different environments and installations. This is critical for deploying and managing AI/ML workloads in a hybrid infrastructure.

Use Cases

Here are some examples of how companies have adopted Kubernetes (K8s) for their AI/ML projects:

  • OpenAI has been an early adopter of K8s. In 2017, the company was running machine learning experiments on K8s clusters. With the K8s autoscaler, OpenAI could deploy such a project in a few days and scale it up to hundreds of GPUs in a week or two. Without the Kubernetes autoscaler, such a process would take months. As a result, OpenAI increased the number of AI experiments tenfold. In 2021, the company expanded its K8s infrastructure to 7,500 nodes for large ML models such as GPT-3, DALL-E and CLIP.
  • Shell uses a K8s-based platform Kubeflow to run tests and quickly experiment with ML models on laptops. Engineers can move these workloads from the test environment to production, and the workloads will function just the same. With Kubernetes, Shell builds thousands of ML models in two hours instead of a month. The time to write the underlying code is reduced from two weeks to four hours.
  • IKEA has developed an internal MLOps platform based on K8s to train ML models on-premises and get inference in the cloud. This allows the MLOps team to orchestrate different types of trained models and, ultimately, improve the customer experience.

Of course, these examples are not broadly representative. Most companies are not fully AI-focused like OpenAI and are not as large as IKEA. They can’t afford to train large AI/ML models from scratch, which takes time and money, but instead run pretrained models and integrate them with other internal services. In other words, these companies use AI/ML inference, not training.

Inference workloads tend to be more dynamic regarding resource utilization than training workloads because production clusters are more likely to experience user and traffic spikes. In such cases, the infrastructure needs to scale up and down quickly, whereas AI/ML training typically requires gradual scaling. Therefore, for AI/ML models that are already trained and deployed, the scalability and dynamic resource utilization of K8s are especially beneficial.

Why GPU Is Better than CPU for Worker Nodes

GPU worker nodes are a better fit for containerized AI/ML workloads than CPU worker nodes for the same reasons as for non-containerized workloads: GPU offers parallel processing capabilities and higher performance for AI/ML than CPUs.

Inference for AI/ML workloads running on GPU worker nodes can be faster than those running on CPU worker nodes due to the following factors:

  • The GPU’s memory architecture is specifically optimized for AI/ML processing, enabling higher memory bandwidth than CPUs.
  • GPUs often provide better computational performance than CPUs for AI/ML training and inference because they have more transistors to process data.

Kubernetes adds its own performance benefits to those of GPUs. In addition to hardware acceleration, AI/ML workloads running on GPU worker nodes get scalability and dynamic resource allocation. Kubernetes also includes plugins for GPU vendor support, making it easy to configure GPU resources for use by AI/ML workloads.

Figure 1. The simplified K8s cluster architecture with GPU worker node

With Kubernetes, you can manage GPU resources across multiple worker nodes. Containers consume GPU resources in essentially the same way as they consume CPU resources.

GPU Vendors Comparison

There are three GPU vendors available for Kubernetes: NVIDIA, AMD, and Intel. When choosing GPU vendors for worker nodes, it’s important to keep in mind that their compatibility with Kubernetes, tool ecosystem, performance, and cost can vary.

 NVIDIA GPU worker nodesAMD GPU worker nodesIntel GPU worker nodes
Compatibility with K8sExcellentGoodGood
Tools ecosystemExcellentGoodFair
PerformanceExcellentGoodFair
CostHighMediumMedium

Let’s compare the three vendors.

  • Compatibility with Kubernetes: NVIDIA is the most compatible with K8s. The company provides CUDA drivers, various container runtimes, and other tools and features that simplify GPU integration and management. AMD and Intel support for K8s is less mature and often requires custom configuration.
  • Tools ecosystem: NVIDIA has the best ecosystem of tools for AI/ML, thanks to software such as the GPU Operator and Container Toolkit, and ML frameworks adapted for NVIDIA GPUs, such as TensorFlow, PyTorch, and MXNet. AMD and Intel also have tools for AI/ML, but they are not as comprehensive as NVIDIA’s.
  • Performance: NVIDIA GPUs are known for their high performance on AI workloads, outperforming the competition on most MLPerf benchmarks. NVIDIA GPUs are ideal for demanding tasks such as deep learning and high-performance computing.
  • Cost: NVIDIA GPUs are the most expensive type of GPU worker node.
  • Flexibility: NVIDIA offers several features that make its GPU-based K8s clusters highly flexible in terms of management and resource utilization compared to its competitors:
    • Multi-instance GPU (MIG) mechanism for NVIDIA A100 GPU to allow a GPU to be securely partitioned into up to seven separate instances for better GPU utilization
    • Multicloud GPU clusters, which can be seamlessly managed and scaled as if deployed in a single cloud
    • Heterogeneous GPU and CPU clusters to simplify the training and management of distributed deep learning models
    • GPU metrics monitoring with Prometheus and visualization with Grafana
    • Support for multiple container runtimes, including Docker, CRI-O, and containers

In summary, NVIDIA GPU worker nodes are the best choice for AI/ML workloads in Kubernetes. They offer the best compatibility with K8s, the best tools ecosystem, and the best performance. That’s why we chose NVIDIA GPUs for Gcore Managed Kubernetes. Our customers get all the benefits of NVIDIA, including the highest performance level for faster training and inference of their AI/ML workloads.

Important Specifics of GPU Scheduling in Kubernetes

To enable GPU scheduling and allow pods to access its resources, you need to install a vendor-specific device plugin from your chosen GPU vendor — NVIDIA, AMD, or Intel.

Pods request GPU resources in the same way they request CPU resources. However, Kubernetes is less flexible with GPU than with CPU when it comes to configuring `limits` and `requests`. With `requests`, you set the amount of resources that a pod is guaranteed to get, such as a minimum quantity. With `limits`, you set the amount of resources that won’t be exceeded, for instance, a maximum quantity. When configuring a pod manifest for GPU requests, `limits` and `requests` should be equal, meaning that a pod won’t get more resources than guaranteed if, for example, the application needs them.

Also, by default, you can’t allocate part of a GPU or multiple GPUs to a container because of the way CPU allocation works. You can only allocate one full GPU per container. This limitation doesn’t help with resource economics. But NVIDIA has managed to overcome this. With its GPU, you can use either use:

  • Time-sharing GPUs, which work by sequentially assigning time intervals to shared containers on a physical GPU. This works for all NVIDIA GPUs.
  • Multi-instance GPUs, which allow a GPU to be divided into up to seven instances for better GPU utilization. This only works with the NVIDIA A100 GPU.

These two features help you to use NVIDIA GPU resources more efficiently and save money on renting GPU instances in the cloud. This is also a significant advantage over other GPU vendors.

Managed Kubernetes vs. Vanilla Kubernetes with GPU

A managed Kubernetes service can offer several advantages over vanilla (open source) Kubernetes for AI/ML workloads running on GPU worker nodes:

  • Flexible choice of GPUs. Managed K8s services typically provide support for GPU instances with various specifications. This makes it easier to choose the appropriate level of GPU acceleration for your AI/ML workloads.
  • Reduced operational overhead. Managed Kubernetes handles the everyday responsibilities of overseeing a Kubernetes cluster, like managing the control plane and implementing K8s updates. This enables you to focus on creating, deploying and managing AI/ML applications.
  • Scalability and reliability. Managed K8s services are typically designed with a strong focus on scalability and reliability, ensuring that your AI/ML workloads can adeptly handle fluctuating traffic and spikes in resource demand.

Gcore Managed Kubernetes with NVIDIA GPU Workers

Gcore Managed Kubernetes helps you to deploy Kubernetes clusters fast, without the need to maintain the underlying infrastructure and Kubernetes backend. The Gcore team controls the master nodes while you control only the worker nodes, reducing your operational burden. Worker nodes can be Gcore Virtual Machines or Bare Metal servers in various configurations, including those with NVIDIA GPU modules.

Conclusion

Managed Kubernetes with GPU worker nodes is a powerful and flexible combination for accelerating AI/ML inference. By taking advantage of both Kubernetes and GPUs, managed Kubernetes with GPU worker nodes can help you improve the performance and efficiency of your AI/ML workloads. The service also frees you from the need to maintain the underlying GPU infrastructure and most Kubernetes components.

Gcore Managed Kubernetes can boost your AI/ML workloads with GPU worker nodes on Bare Metal for faster inference and operational efficiency. We offer a 99.9% SLA with free production management and free egress traffic—at outstanding value for money.

Explore Managed Kubernetes

Related articles

Introducing AI Cloud Stack: turning GPU clusters into revenue-generating AI clouds

Enterprises and cloud providers face major roadblocks when trying to deploy GPU infrastructure at scale: long time-to-market, operational inefficiencies, and difficulty bringing new capacity to market profitably. Establishing AI environments with hyperscaler-grade functionality typically requires years of engineering effort, multiple partner integrations, and complex operational tooling.Not anymore.With Gcore AI Cloud Stack, organizations can transform bare Nvidia GPU clusters into a fully cloud-enabled environment—complete with orchestration, observability, billing, and go-to-market support—all in a fraction of the time it would take to build from scratch, maximizing GPU utilization.This proven solution marks the latest addition to the Gcore AI product suite, enabling enterprises and cloud providers to accelerate AI cloud deployment through better GPU utilization, monetization, reduced complexity, and hyperscaler-grade functionality in their own AI environments. Gcore AI Cloud Stack is already powering leading technology providers, including VAST and Nokia.Why we built AI Cloud StackBuying and efficiently operating GPUs at a large scale requires significant investment, time, and expertise. Most organizations need to hit the ground running, bypassing years of in-house R&D. Without a robust reference architecture, infrastructure and network preparation, 24/7 monitoring, dynamic resource allocation, orchestration abstraction, and clear paths to utilization or commercialization, enterprises can spend years before seeing ROI.“Gcore brings together the key pieces—compute, networking, and storage—into a usable stack. That integration helps service providers stand up AI clouds faster and onboard clients sooner, accelerating time to revenue. Combined with the advanced multi-tenant capabilities of VAST’s AI Operating System, it delivers a reliable, scalable, and futureproof AI infrastructure. Gcore offers operators a valuable option to move quickly without building everything themselves.”— Dan Chester, CSP Director EMEA, VAST DataAt Gcore, we understand that organizations across industries will continue to invest heavily in GPUs to power the next wave of AI innovation—meaning these challenges aren’t going away. AI Cloud Stack solves today’s challenges and anticipates tomorrow’s. It ensures that GPU infrastructure at the core of AI innovation delivers maximum value to enterprises.How AI Cloud Stack worksThis comprehensive solution is structured across three stages.1. Provision and launchGcore handles the complexities of initial deployment, from physical infrastructure setup to orchestration, enabling enterprises to go live quickly with a reliable GPU cloud.2. Operations and managementThe solution includes monitoring, orchestration, ticket management, and ongoing support to keep environments stable, secure, and efficient. This includes automated GPU failure handling and optimized resource management.3. Go-to-market supportUnlike other solutions, AI Cloud Stack goes beyond infrastructure. Building on Gcore’s experience as a trusted NVIDIA Cloud Provider (NCP), it helps customers sell their capacity, including through established reseller channels. This integrated GTM support ensures capacity doesn’t sit idle, losing value and potential.What sets Gcore apartUnlike many providers entering this market, Gcore has operated as a global cloud provider for over a decade and has been an early player in the global AI landscape. Gcore knows what it takes to build, scale, and sell cloud and AI services—because it has done it for customers and partners worldwide. Gcore AI Cloud Stack has already been deployed on thousands of NVIDIA Hopper GPUs across Europe to build a commercial-grade AI cloud with full orchestration, abstraction, and monetization layers. That real-world experience allows Gcore to deliver the infrastructure, operational playbook, and sales enablement customers need to succeed.“We’re pleased to collaborate with Gcore, a strong European ISV, to advance a networking reference architecture for AI clouds. Combining Nokia’s open, programmable, and reliable networking with Gcore’s cloud software accelerates deployable blueprints that customers can adopt across data centers and the edge.”— Mark Vanderhaegen, Head of Business Development, Data Center Networks, NokiaKey features of AI Cloud StackCloudification of GPU clusters: Transform raw infrastructure into cloud-like consumption: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), GPU as a Service (GPUaaS), or Model as a Service (MaaS).Gcore AI suite integration: Enable serverless inference and training capabilities through Gcore’s enterprise AI suite.Hyperscaler functionality: Built-in billing, observability, orchestration, and professional services deliver the tools CSPs and enterprises need to operate—similar to what they’re used to getting on public cloud.White-label options: Deliver capacity under your own brand while relying on Gcore’s proven global cloud backbone.NVIDIA AI Enterprise-ready: Integrate pretrained models, chatbots, and NVIDIA AI blueprints to accelerate time-to-market.The future of AI cloudsWith Gcore AI Cloud Stack, enterprises no longer need to spend years building the operational, technical, and commercial capabilities required to utilize and monetize GPU infrastructure. Instead, they can launch in a few months with a hyperscaler-grade solution designed for today’s AI demands.Whether you’re a cloud service provider, an enterprise investing in AI infrastructure, or a partner looking to accelerate GPU monetization, AI Cloud Stack gives you the speed, scalability, and GTM support you need.Ready to turn your GPU clusters into a fully monetized, production-grade AI cloud? Talk with our AI experts to learn how you can go from bare metal to model-as-a-service in months, not years.Get a customized consultation

Gcore Radar Q1–Q2 2025: three insights into evolving attack trends

Cyberattacks are becoming more frequent, larger in scale, and more sophisticated in execution. For businesses across industries, this means protecting digital resources is more important than ever. Staying ahead of attackers requires not only robust defense solutions but also a clear understanding of how attack patterns are changing.The latest edition of the Gcore Radar report, covering the first half of 2025, highlights important shifts in attack volumes, industry targets, and attacker strategies. Together, these findings show how the DDoS landscape is evolving, and why adaptive defense has never been more important.Here are three key insights from the report, which you can download in full here.#1. DDoS attack volumes continue to riseIn Q1–Q2 2025, the total number of DDoS attacks grew by 21% compared to H2 2024 and 41% year-on-year.The largest single attack peaked at 2.2 Tbps, surpassing the previous record of 2 Tbps in late 2024.The growth is driven by several factors, including the increasing availability of DDoS-for-hire services, the rise of insecure IoT devices feeding into botnets, and heightened geopolitical and economic tensions worldwide. Together, these factors make attacks not only more common but also harder to mitigate.#2. Technology overtakes gaming as the top targetThe distribution of attacks by industry has shifted significantly. Technology now represents 30% of all attacks, overtaking gaming, which dropped from 34% in H2 2024 to 19% in H1 2025. Financial services remain a prime target, accounting for 21% of attacks.This trend reflects attackers’ growing focus on industries with broader downstream impact. Hosting providers, SaaS platforms, and payment systems are attractive targets because a single disruption can affect entire ecosystems of dependent businesses.#3. Attacks are getting smarter and more complexAttackers are increasingly blending high-volume assaults with application-layer exploits aimed at web apps and APIs. These multi-layered tactics target customer-facing systems such as inventory platforms, payment flows, and authentication processes.At the same time, attack durations are shifting. While maximum duration has shortened from five hours to three, mid-range attacks lasting 10–30 minutes have nearly quadrupled. This suggests attackers are testing new strategies designed to bypass automated defenses and maximize disruption.How Gcore helps businesses stay protectedAs attack methods evolve, businesses need equally advanced protection. Gcore DDoS Protection offers over 200 Tbps filtering capacity across 210+ points of presence worldwide, neutralizing threats in real time. Integrated Web Application and API Protection (WAAP) extends defense beyond network perimeters, protecting against sophisticated application-layer and business-logic attacks. To explore the report’s full findings, download the complete Gcore Radar report here.Download Gcore Radar Q1-Q2 2025

Edge AI is your next competitive advantage: highlights from Seva Vayner’s webinar

Edge AI isn’t just a technical milestone. It’s a strategic lever for businesses aiming to gain a competitive advantage with AI.As AI deployments grow more complex and more global, central cloud infrastructure is hitting real-world limits: compliance barriers, latency bottlenecks, and runaway operational costs. The question for businesses isn’t whether they’ll adopt edge AI, but how soon.In a recent webinar with Mobile World Live, Seva Vayner, Gcore’s Product Director of Edge Cloud and AI, made the business case for edge inference as a competitive differentiator. He outlined what it takes to stay ahead in a world where speed, locality, and control define AI success.Scroll on to watch Seva explain why your infrastructure choices now shape your market position later.Location is everything: edge over cloudAI is no longer something globally operating businesses can afford to run from a central location. Regional regulations and growing user expectations mean models must be served as close to the user as possible. This reduces latency, but perhaps more importantly is essential for compliance with local laws.Edge AI also keeps costs down by avoiding costly international traffic routes. When your users are global but your infrastructure isn’t, every request becomes an expensive, high-latency journey across the internet.Edge inference solves three problems at once in an increasingly regionally fragmented AI landscape:Keeps compute near users for low latencyCuts down on international transit for reduced costsHelps companies stay compliant with local lawsPrivate edge: control over convenienceMany businesses started their AI journey by experimenting with public APIs like OpenAI’s. But as companies and their AI use cases mature, that’s not good enough anymore. They need full control over data residency, model access, and deployment architecture, especially in regulated industries or high-sensitivity environments.That’s where private edge deployments come in. Instead of relying on public endpoints and shared infrastructure, organizations can fully isolate their AI environments, keeping data secure and models proprietary.This approach is ideal for healthcare, finance, government, and any sector where data sovereignty and operational security are critical.Optimizing edge AI: precision over powerDeploying AI at the edge requires right-sizing your infrastructure for the models and tasks at hand. That’s both technically smarter and far more cost-effective than throwing maximum power and size at every use case.Making smart trade-offs allows businesses to scale edge AI sustainably by using the right hardware for each use case.AI at the edge helps businesses deliver the experience without the excess. With the control that the edge brings, hardware costs can be cut by using exactly what each device or location requires, reducing financial waste.Final takeawayAs Seva put it, AI infrastructure decisions are no longer just financial; they’re part of serious business strategy. From regulatory compliance to operational cost to long-term scalability, edge inference is already a necessity for businesses that plan to serve AI at scale and get ahead in the market.Gcore offers a full suite of public and private edge deployment options across six continents, integrated with local telco infrastructure and optimized for real-time performance. Learn more about Everywhere Inference, our edge AI solution, or get in touch to see how we can help tailor a deployment model to your needs.Ready to get started? Deploy a model in just three clicks with Gcore Everywhere Inference.Discover Everywhere Inference

Smart caching and predictive streaming: the next generation of content delivery

As streaming demand surges worldwide, providers face mounting pressure to deliver high-quality video without buffering, lag, or quality dips, no matter where the viewer is or what device they're using. That pressure is only growing as audiences consume content across mobile, desktop, smart TVs, and edge-connected devices.Traditional content delivery networks (CDNs) were built to handle scale, but not prediction. They reacted to demand, but couldn’t anticipate it. That’s changing.Today, predictive streaming and AI-powered smart caching are enabling a proactive, intelligent approach to content delivery. These technologies go beyond delivering content by forecasting what users will need and making sure it's there before it's even requested. For network engineers, platform teams, and content providers, this marks a major evolution in performance, reliability, and cost control.What are predictive streaming and smart caching?Predictive streaming is a technology that uses AI to anticipate what a viewer will watch next, so the content can be ready before it's requested. That might mean preloading the next episode in a series, caching popular highlights from a live event, or delivering region-specific content based on localized viewing trends.Smart caching supports this by storing that predicted content on servers closer to the viewer, reducing delays and buffering. Together, they make streaming faster and smoother by preparing content in advance based on user behavior.Unlike traditional caching, which relies on static popularity metrics or simple geolocation, predictive streaming is dynamic. It adapts in real time to what’s happening on the platform: user actions, traffic spikes, network conditions, and content trends. This results in:Faster playback with minimal bufferingReduced bandwidth and server loadHigher quality of experience (QoE) scores across user segmentsFor example, during the 2024 UEFA European Championship, several broadcasters used predictive caching to preload high-traffic game segments and highlight reels based on past viewer drop-off points. This allowed for instant replay delivery in multiple languages without overloading central servers.Why predictive streaming matters for viewersGlobally, viewers tend to binge-watch new streaming platform releases. For example, sci-fi-action drama Fallout got 25% of its annual US viewing minutes (2.9 billion minutes) in its first few days of release. The South Korean series Queen of Tears became Netflix's most-watched Korean drama of all time in 2024, amassing over 682.6 million hours viewed globally, with more than half of those watch hours occurring during its six-week broadcast run.A predictive caching system can take advantage of this launch-day momentum by pre-positioning likely-to-be-watched episodes, trailers, or bonus content at the edge, customized by region, device, or time of day.The result is a seamless, high-performance experience that anticipates user behavior and scales intelligently to meet it.Benefits for streaming providersTraditional CDNs often waste resources caching content that may never be viewed. Predictive caching focuses only on content that is likely to be accessed, leading to:Lower egress costsReduced server loadMore efficient cache hit ratiosOne of the core benefits of predictive streaming is latency reduction. By caching content at the edge before it’s requested, platforms avoid the delay caused by round-trips to origin servers. This is especially critical for:Live sports and eventsInteractive or real-time formats (e.g., polls, chats, synchronized streams)Edge environments with unreliable last-mile connectivityFor instance, during the 2024 Copa América, mobile viewers in remote areas of Argentina were able to stream matches without delay thanks to proactive edge caching based on geo-temporal viewing predictions.How it worksAt the core of predictive streaming is smart caching: the process of storing data closer to the end user before it’s explicitly requested. Here’s how it works:Data ingestion: The system gathers data on user behavior, device types, content popularity, and location-based trends.Behavior modeling: AI models identify patterns (e.g., binge-watching behaviors, peak-hour traffic, or regional content spikes).Pre-positioning: Based on predictions, the system caches video segments, trailers, or interactive assets to edge servers closest to where demand is expected.Real-time adaptation: As user behavior changes, the system continuously updates its caching strategy.Use cases across streaming ecosystemsSmart caching and predictive delivery benefit nearly every vertical of streaming.Esports and gaming platforms: Live tournaments generate unpredictable traffic surges, especially when underdog teams advance. Predictive caching helps preload high-interest match content, post-game analysis, and multilingual commentary before traffic spikes hit. This helps provide global availability with minimal delay.Corporate webcasts and investor events: Virtual AGMs or earnings calls need to stream seamlessly to thousands of stakeholders, often under compliance pressure. Predictive systems can cache frequently accessed segments, like executive speeches or financial summaries, at regional nodes.Education platforms: In EdTech environments, predictive delivery ensures that recorded lectures, supplemental materials, and quizzes are ready for users based on their course progression. This reduces lag for remote learners on mobile connections.VOD platforms with regional licensing: Content availability differs across geographies. Predictive caching allows platforms to cache licensed material efficiently and avoid serving geo-blocked content by mistake, while also meeting local performance expectations.Government or emergency broadcasts: During public health updates or crisis communications, predictive streaming can support multi-language delivery, instant replay, and mobile-first optimization without overloading networks during peak alerts.Looking forward: Personalization and platform governanceWe predict that the next wave of predictive streaming will likely include innovations that help platforms scale faster while protecting performance and compliance:Viewer-personalized caching, where individual user profiles guide what’s cached locally (e.g., continuing series, genre preferences)Programmatic cache governance, giving DevOps and marketing teams finer control over how and when content is distributedCross-platform intelligence, allowing syndicated content across services to benefit from shared predictions and joint caching strategiesGcore’s role in the predictive futureAt Gcore, we’re building AI-powered delivery infrastructure that makes the future of streaming a practical reality. Our smart caching, real-time analytics, and global edge network work together to help reduce latency and cost, optimize resource usage, and improve user retention and stream stability.If you’re ready to unlock the next level of content delivery, Gcore’s team is here to help you assess your current setup and plan your predictive evolution.Discover how Gcore streaming technologies helped fan.at boost subscription revenue by 133%

From budget strain to AI gain: Watch how studios are building smarter with AI

Game development is in a pressure cooker. Budgets are ballooning, infrastructure and labor costs are rising, and players expect more complexity and polish with every release. All studios, from the major AAAs to smaller indies, are feeling the strain.But there is a way forward. In a recent webinar, Sean Hammond, Territory Manager for the UK and Nordics at Gcore, explained how AI is reshaping game development workflows and how the right infrastructure strategy can reduce costs, speed up production, and create better player experiences.Scroll on to watch key moments from Sean's talk and explore how studios can make AI work for them.Rising costs are threatening game developmentGame revenue has slowed, but development costs continue to rise. Some AAA titles now surpass $100 million in development budgets. The complexity of modern games demands more powerful servers, scalable infrastructure, and larger teams, making the industry increasingly unsustainable.Personnel and infrastructure costs are also climbing. Developers, artists, and QA testers with specialized skills are in high demand, as are technologies like VR, AR, and AI. Studios are also having to invest more in cybersecurity to protect player data, detect cheating, and safeguard in-game economies.AI is revolutionizing GameDev, even without a perfect use caseWhile the perfect use case for AI in gaming may not have been found yet, it’s already transforming how games are built, tested, and personalized.Sean highlighted emerging applications, including:Smarter QA testingAI-driven player personalizationReal-time motion and animationAccelerated environment and character designMultilingual localizationAdaptive game balancingStudios are already applying these technologies to reduce production timelines and improve immersion.The challenge of secure, scalable AI adoptionOf course, AI adoption doesn’t come without its challenges. Chief among them is security. Public models pose risks: no studio wants their proprietary assets to end up training a competitor’s model.The solution? Deploy AI models on infrastructure you trust so you’re in complete control. That’s where Gcore comes in.Gcore Everywhere Inference reduces compute costs and infrastructure bloat by allowing you to deploy only what you need, where you need it.The future of gaming is AI at scaleTo power real-time player experiences, your studio needs to deploy AI globally, close to your users.Gcore Everywhere Inference lets you deploy models worldwide at the edge with minimal latency because data is not routed back to central servers. This means fast, responsive gameplay and a new generation of real-time, AI-driven features.As a company originally built by gamers, we’ve developed AI solutions with gaming studios in mind. Here’s what we offer:Global edge inference for real-time gameplay: Deploy your AI models close to players worldwide, enabling fast, responsive player experiences without routing data to central servers.Full control over AI model deployment and IP protection: Avoid public APIs and retain full ownership of your assets with on-prem options, preventing your proprietary data from being available to competitors.Scalable, cost-efficient infrastructure tailored to gaming workloads: Deploy only what you need to avoid overprovisioning and reduce compute costs without sacrificing performance.Enhanced player retention through AI-driven personalization and matchmaking: Real-time inference powers smarter NPCs and dynamic matchmaking, improving engagement and keeping players coming back for more.Deploy models in 3 clicks and under 10 seconds: Our developer-friendly platform lets you go from trained model to global deployment in seconds. No complex DevOps setup required.Final takeawayAI is advancing game development fast, but only if it’s deployed right. Gcore offers scalable, secure, and cost-efficient AI infrastructure that helps studios create smarter, faster, and more immersive games.Want to see how it works? Deploy your first model in just a few clicks.Check out our blog on how AI is transforming gaming in 2025

No capacity = no defense: rethinking DDoS resilience at scale

DDoS attacks are growing so massive they are overwhelming the very infrastructure designed to stop them. Earlier this year, a peak attack exceeding 7 Tbps was recorded, while 1–2 Tbps attacks have become everyday occurrences. Such volumes were unimaginable just a few years ago.Yet many businesses still depend on mitigation systems that were not designed to scale alongside this rapid attack growth. While these systems may have smart detection, that advantage is moot if physical infrastructure cannot handle the load. Today, raw capacity is non-negotiable — intelligent filtering alone isn’t enough; you need vast, globally distributed throughput.Lukasz Karwacki, Gcore’s Security Solution Architect specializing in DDoS, explains why modern DDoS protection requires immense capacity, global distribution, and resilient routing. Scroll down to watch him describe why a globally distributed defense model is now the minimum standard for mitigating devastating DDoS attacks.DDoS is a capacity war, not just a traffic spikeThe central challenge in DDoS mitigation today is the total attack volume versus total available throughput.Attacks do not originate from a single location. Global botnets harness compromised devices across Asia, Africa, Europe, and the Americas. When all this traffic converges on a single data center, it creates a structural mismatch: a single site’s limited capacity pitted against the full bandwidth of the internet.Anycast is non-negotiable for global capacityTo counter today’s attack volumes, mitigation capacity must be distributed globally, and that’s where Anycast routing plays a critical role.Anycast routes incoming traffic to the nearest available scrubbing center. If one region is overwhelmed or offline, traffic is automatically redirected elsewhere. This eliminates single points of failure and enables the absorption of massive attacks without compromising service availability.By contrast, static mitigation pipelines create bottlenecks: all traffic funnels through a single point, making it easy for attackers to overwhelm that location. Centralized mitigation means centralized failure. The more distributed your infrastructure, the harder it is to take down — that’s resilient network design.Why always-on cloud defense outperforms on-demand protectionSome DDoS defenses activate only when an attack is detected. These on-demand models may save costs but introduce a brief delay while traffic is rerouted and protections come online.Even a few seconds of delay can allow a high-speed attack to inflict damage.Gcore’s cloud-native DDoS protection is always-on, continuously monitoring, filtering, and balancing traffic across all scrubbing centers. This means no activation lag and no dependency on manual triggers.Capacity is the new baseline for protectionModern DDoS attacks focus less on sophistication and more on sheer scale. Attackers simply overwhelm infrastructure by flooding it with more traffic than it can handle.True DDoS protection begins with capacity planning — not just signatures or rulesets. You need sufficient bandwidth, processing power, and geographic distribution to absorb attacks before they reach your core systems.At Gcore, we’ve built a globally distributed DDoS mitigation network with over 200 Tbps capacity, 40+ protected data centers, and thousands of peering partners. Using Anycast routing and always-on defense, our infrastructure withstands attacks that other systems simply can’t.Many customers turn to Gcore for DDoS protection after other providers fail to keep up with attack capacity.Find out why Fawkes Games turned to Gcore for DDoS protection

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.