Home
Blog
GPU Acceleration in AI: How Graphics Processing Units Drive Deep Learning

GPU Acceleration in AI: How Graphics Processing Units Drive Deep Learning

September 19, 2023

9 min read

GPU Acceleration in AI: How Graphics Processing Units Drive Deep Learning

This article discusses how GPUs are shaping a new reality in the hottest subset of AI training: deep learning. We’ll explain the GPU architecture and how it fits with AI workloads, why GPUs are better than CPUs for training deep learning models, and how to choose an optimal GPU configuration.

How GPUs Drive Deep Learning

The key GPU features that power deep learning are its parallel processing capability and, at the foundation of this capability, its core (processor) architecture.

Parallel Processing

Deep learning (DL) relies on matrix calculations, which are performed effectively using the parallel computing that GPUs provide. To understand this interrelationship better, let’s consider a simplified training process of a deep learning model. The model takes the input data, such as images, and has to recognize a specific object in these images using a correlation matrix. The matrix summarizes a data set, identifies patterns, and returns results accordingly: If the object is recognized, the model labels it “true”, otherwise it is labeled “false.” Below is a simplified illustration of this process.

Figure 1. The simplified illustration of the DL training process

An average DL model has billions of parameters, each of which contributes to the size of the matrix weights used in the matrix calculations. Each of the billion parameters must be taken into account, that’s why the true/false recognition process requires running billions of iterations of the same matrix calculations. The iterations are not linked to each other, they are executed in parallel. GPUs are perfect for handling these types of operations because of their parallel processing capabilities. This is enabled by devoting more transistors to data processing.

Core Architecture: Tensor Cores

NVIDIA tensor cores are an example of how hardware architecture can effectively adapt to DL and AI. Tensor cores—special kinds of processors—were designed specifically for the mathematical calculations needed for deep learning, while earlier cores were also used for video rendering and 3D graphics. “Tensor” refers to tensor calculations, which are matrix calculations. A tensor is a mathematical object; if a tensor has two dimensions, it is a matrix. Below is a visualization of how a Tensor core calculates matrices.

Figure 2. Volta Tensor Core matrix calculations. Source: NVIDIA

NVIDIA Volta-based chips, like Tesla V100 with 640 tensor cores, became the first fully AI-focused GPUs, and they significantly influenced and accelerated the DL development industry. NVIDIA added tensor cores to its GPU chips in 2017, based on the Volta architecture.

Multi-GPU Clusters

Another GPU feature that drives DL training is the ability to increase throughput by building multi-GPU clusters, where many GPUs work simultaneously. This is especially useful when training large, scalable DL models with billions and trillions of parameters. The most effective approach for such training is to scale GPUs horizontally using interfaces such as NVLink and InfiniBand. These high-speed interfaces allow GPUs to exchange data directly, bypassing CPU bottlenecks.

Figure 3. NVIDIA H100 with NVLink GPU-to-GPU connections. Source: NVIDIA

For example, with the NVLink switch system, you can connect 256 NVIDIA GPUs in a cluster and get 57.6 Tbps of bandwidth. A cluster of that size can significantly reduce the time needed to train large DL models. Though there are several AI-focused GPU vendors on the market, NVIDIA is the undisputed leader, and makes the greatest contribution to DL. This is one of the reasons why Gcore uses NVIDIA chips for its AI GPU infrastructure.

GPU vs. CPU Comparison

A CPU executes tasks serially. Instructions are completed on a first-in, first-out (FIFO) basis. CPUs are better suited for serial task processing because they can use a single core to execute one task after another. CPUs also have a wider range of possible instructions than GPUs and can perform more tasks. They interact with more computer components such as ROM, RAM, BIOS, and input/output ports.

A GPU performs parallel processing, which means it processes tasks by dividing them between multiple cores. The GPU is a kind of advanced calculator: it can only receive a limited set of instructions and execute only graphics- and AI-related tasks, such as matrix multiplication (CPU can execute them too.) GPUs only need to interact with the display and memory. In the context of parallel computing, this is actually a benefit, as it allows for a greater number of cores devoted solely to these operations. This specialization enhances the GPU’s efficiency in parallel task execution.

An average consumer-grade GPU has hundreds of cores adapted to perform simple operations quickly and in parallel, while an average consumer-grade CPU has 2–16 cores adapted to complex sequential operations. Thus, the GPU is better suited for DL because it provides many more cores to perform the necessary computations faster than the CPU.

Figure 4. An average CPU has 2–16 cores, while an average GPU has hundreds

The parallel processing capabilities of the GPU are made possible by dedicating a larger number of transistors to data processing. Rather than relying on large data caches and complex flow control, GPUs can reduce memory access latencies with computation compared to CPUs. This helps avoid long memory access latencies, frees up more transistors for data processing rather than data caching, and, ultimately, benefits highly parallel computations.

Figure 5. GPUs devote more transistors to data processing than CPUs. Source: NVIDIA

GPUs also use video DRAM: GDDR5 and GDDR6. These are much faster than CPU DRAM: DDR3 and DDR4.

How GPU Outperforms CPU in DL Training

DL requires a lot of data to be transferred between memory and cores. To handle this, GPUs have a specially optimized memory architecture which allows for higher memory bandwidth than CPUs, even when GPUs technically have the same or less memory capacity. For example, a GPU with just 32 GB of HBM (high bandwidth memory) can deliver up to 1.2 Tbps of memory bandwidth and 14 TFLOPS of computing. In contrast, a CPU can have hundreds of GB of HBM, yet deliver only 100 Gbps bandwidth and 1 TFLOPS of computing.

Since GPUs are faster in most DL cases, they can also be cheaper when renting. If you know the approximate time you spend on DL training, you can simply check the prices of cloud providers to estimate how much money you will save by using GPUs instead of CPUs.

Depending on the configuration, models, and frameworks, GPUs often provide better performance than CPUs in DL training. Here are some direct comparisons:

Azure tested various cloud CPU and GPU clusters using the TensorFlow and Keras frameworks for five DL models of different sizes. In all cases, GPU cluster throughput consistently outperformed CPU cluster throughput, with improvements ranging from 186% to 804%.
Deci compared the NVIDIA Tesla T4 GPU and the Intel Cascade Lake CPU using the EfficientNet-B2 model. They found that the GPU was 3 times faster than the CPU.
IEEE published the results of a survey about running different types of neural networks on an Intel i5 9th generation CPU and an NVIDIA GeForce GTX 1650 GPU. When testing CNN (convolutional neural networks,) which are better suited to parallel computation, the GPU was between 4.9 and 8.8 times faster than the CPU. But when testing ANN (artificial neural networks,) the execution time of CPUs was 1.2 times faster than that of GPUs. However, GPUs outperformed CPUs as the data size increased, regardless of the NN architecture.

Using CPU for DL Training

The last comparison case shows that CPUs can sometimes be used for DL training. Here are a few more examples of this:

There are CPUs with 128 cores that can process some AI workloads faster than consumer GPUs.
Some algorithms allow the optimization DL model to perform better training on CPUs. For instance, Rice’s Brown School of Engineering has introduced an algorithm that makes CPUs 15 times faster than GPUs for some AI tasks.
There are cases where the precision of a DL model is not critical, like speech recognition under near-ideal conditions without any noise and interference. In such situations, you can train a DL model using floating-point weights (FP16, FP32) and then round them to integers. Because CPUs work better with integers than GPUs, they can be faster, although the results will not be as accurate.

However, using CPUs for DL training is still an unusual practice. Most DL models are adapted for parallel computing, i.e., for GPU hardware. Thus, building a CPU-based DL platform is a task that may be both difficult and unnecessary. It can take an unpredictable amount of time to select a multi-core CPU instance and then configure a CPU-adapted algorithm to train your model. By selecting a GPU instance, you get a platform that’s ready to build, train, and run your DL model.

How to Choose an Optimal GPU Configuration for Deep Learning

Choosing the optimal GPU configuration is basically a two-step process:

Determine the stage of deep learning you need to execute.
Choose a GPU server specification to match.

Note: We’ll only consider specification criteria for DL training, because DL inference (execution of a trained DL model,) as you’ll see, is not such a big deal as training.

1. Determine Which Stage of Deep Learning You Need

To choose an optimal GPU configuration, first you must understand which of two main stages of DL you will execute on GPUs: DL training, or DL inference. Training is the main challenge of DL, because you have to adjust the huge number (up to trillions) of matrix coefficients (weights.) The process is close to a brute-force search for the best combinations to give the best results (though some techniques help to reduce the number of computations, for example, the stochastic gradient descent algorithm.) Therefore, you need maximum hardware performance for training, and vendors make GPUs specifically designed for this. For example, the NVIDIA A100 and H100 GPUs are positioned as devices for DL training, not for inference.

Once you have calculated all the necessary matrix coefficients, the model is trained and ready for inference. At this stage, a DL model only needs to multiply the input data and the matrix coefficients once to produce a single result—for example, when a text-to-image AI generator generates an image according to a user’s prompt. Therefore, inference is always simpler than training in terms of math computations and required computational resources. In some cases, DL inference can be run on desktop GPUs, CPUs, and smartphones. An example is an iPhone with face recognition: the relatively modest GPU with 4–5 cores is sufficient for DL inference.

2. Choose the GPU Specification for DL Training

When choosing a GPU server or virtual GPU instance for DL training, it’s important to understand what training time is appropriate for you: hours, days, months, etc. To achieve this, you can count operations in the model or use information about reported training time and GPU model performance. Then, decide on the resources you need:

Memory size is a key feature. You need to specify at least as much GPU RAM as your DL model size. This is sufficient if you are not pressed for time to market, but if you’re under time pressure then it’s better to specify sufficient memory plus extra in reserve.
The number of tensor cores is less critical than the size of the GPU memory, since it only affects the computation speed. However, if you need to train a model faster, then the more cores the better.
Memory bandwidth is critical if you need to scale GPUs horizontally, for example, when the training time is too long, the dataset is huge, or the model is highly complex. In such cases, check whether the GPU instances support interconnects, such as NVLink or InfiniBand.

So, memory size is the most important thing when training a DL model: if you don’t have enough memory, you won’t be able to run the training. For example, to run the LLaMA model with 7 billion parameters at full precision, the Hugging Face technical team suggests using 28 GB of GPU RAM. This is the result of multiplying 7×4, where 7 is the tensor size (7B), and 4 is four bits for FP32 (the full-precision format.) For FP16 (half-precision), 14 GB is enough (7×2.) The full-precision format provides greater accuracy. The half-precision format provides less accuracy but makes training faster and more memory efficient.

Kubernetes as a Tool for Improving DL Inference

To improve DL inference, you can containerize your model and use a managed Kubernetes service with GPU instances as worker nodes. This will help you achieve greater scalability, resiliency, and cost savings. With Kubernetes, you can automatically scale resources as needed. For example, if the number of user prompts to your model spikes, you will need more compute resources for inference. In that case, more GPUs are allocated for DL inference only when needed, meaning you have no idle resources and no monetary waste.

Managed Kubernetes also reduces operational overhead and helps to automate cluster maintenance. A provider manages master nodes (the control plane.) You manage only the worker nodes on which you deploy your model, instead focusing on its development.

AI Frameworks that Power Deep Learning on GPUs

Various free, open-source AI frameworks help to train deep neural networks and are specifically designed to be run on GPU instances. All of the following frameworks also support NVIDIA’s Compute Unified Device Architecture (CUDA.) This is a parallel computing platform and API that enables the development of GPU-accelerated applications, including DL models. CUDA can significantly improve their performance.

TensorFlow is a library for ML and AI focused on deep learning model training and inference. With TensorFlow, developers can create dataflow graphs. Each graph node represents a matrix operation, and each connection between nodes is a matrix (tensor.) TensorFlow can be used with several programming languages, including Python, C++, JavaScript, and Java.

PyTorch is a machine-learning framework based on the Torch library. It provides two high-level features: tensor computing with strong acceleration via GPUs, and deep neural networks built on a tape-based auto-differentiation system. PyTorch is considered more flexible than TensorFlow because it gives developers more control over the model architecture.

MXNet is a portable and lightweight DL framework that can be used for DL training and inference not only on GPUs, but also on CPUs and TPUs (Tensor Processing Units.) MXNet supports Python, C++, Scala, R, and Julia.

PaddlePaddle is a powerful, scalable, and flexible framework that, like MXNet, can be used to train and deploy deep neural networks on a variety of devices. PaddlePaddle provides over 500 algorithms and pretrained models to facilitate rapid DL development.

Gcore’s Cloud GPU Infrastructure

As a cloud provider, Gcore offers AI GPU Infrastructure powered by NVIDIA chips:

Virtual machines and bare metal servers with consumer- and enterprise-grade GPUs
AI clusters based on servers with A100 and H100 GPUs
Managed Kubernetes with virtual and physical GPU instances that can be used as worker nodes

With Gcore’s GPU infrastructure, you can train and deploy DL models of any type and size. To learn more about our cloud services and how they can help in your AI journey, contact our team.

Conclusion

The unique design of GPUs, focused on parallelism and efficient matrix operations, makes them the perfect companion for the AI challenges of today and tomorrow, including deep learning. Their profound advantages over CPUs are underscored by their computational efficiency, memory bandwidth, and throughput capabilities.

When seeking a GPU, consider your specific deep learning goals, time, and budget. These help you to choose an optimal GPU configuration.

Book a GPU instance

Gcore Team

Content Team

Introducing AI Cloud Stack: turning GPU clusters into revenue-generating AI clouds

Enterprises and cloud providers face major roadblocks when trying to deploy GPU infrastructure at scale: long time-to-market, operational inefficiencies, and difficulty bringing new capacity to market profitably. Establishing AI environments with hyperscaler-grade functionality typically requires years of engineering effort, multiple partner integrations, and complex operational tooling.Not anymore.With Gcore AI Cloud Stack, organizations can transform bare Nvidia GPU clusters into a fully cloud-enabled environment—complete with orchestration, observability, billing, and go-to-market support—all in a fraction of the time it would take to build from scratch, maximizing GPU utilization.This proven solution marks the latest addition to the Gcore AI product suite, enabling enterprises and cloud providers to accelerate AI cloud deployment through better GPU utilization, monetization, reduced complexity, and hyperscaler-grade functionality in their own AI environments. Gcore AI Cloud Stack is already powering leading technology providers, including VAST and Nokia.Why we built AI Cloud StackBuying and efficiently operating GPUs at a large scale requires significant investment, time, and expertise. Most organizations need to hit the ground running, bypassing years of in-house R&D. Without a robust reference architecture, infrastructure and network preparation, 24/7 monitoring, dynamic resource allocation, orchestration abstraction, and clear paths to utilization or commercialization, enterprises can spend years before seeing ROI.“Gcore brings together the key pieces—compute, networking, and storage—into a usable stack. That integration helps service providers stand up AI clouds faster and onboard clients sooner, accelerating time to revenue. Combined with the advanced multi-tenant capabilities of VAST’s AI Operating System, it delivers a reliable, scalable, and futureproof AI infrastructure. Gcore offers operators a valuable option to move quickly without building everything themselves.”— Dan Chester, CSP Director EMEA, VAST DataAt Gcore, we understand that organizations across industries will continue to invest heavily in GPUs to power the next wave of AI innovation—meaning these challenges aren’t going away. AI Cloud Stack solves today’s challenges and anticipates tomorrow’s. It ensures that GPU infrastructure at the core of AI innovation delivers maximum value to enterprises.How AI Cloud Stack worksThis comprehensive solution is structured across three stages.1. Provision and launchGcore handles the complexities of initial deployment, from physical infrastructure setup to orchestration, enabling enterprises to go live quickly with a reliable GPU cloud.2. Operations and managementThe solution includes monitoring, orchestration, ticket management, and ongoing support to keep environments stable, secure, and efficient. This includes automated GPU failure handling and optimized resource management.3. Go-to-market supportUnlike other solutions, AI Cloud Stack goes beyond infrastructure. Building on Gcore’s experience as a trusted NVIDIA Cloud Provider (NCP), it helps customers sell their capacity, including through established reseller channels. This integrated GTM support ensures capacity doesn’t sit idle, losing value and potential.What sets Gcore apartUnlike many providers entering this market, Gcore has operated as a global cloud provider for over a decade and has been an early player in the global AI landscape. Gcore knows what it takes to build, scale, and sell cloud and AI services—because it has done it for customers and partners worldwide. Gcore AI Cloud Stack has already been deployed on thousands of NVIDIA Hopper GPUs across Europe to build a commercial-grade AI cloud with full orchestration, abstraction, and monetization layers. That real-world experience allows Gcore to deliver the infrastructure, operational playbook, and sales enablement customers need to succeed.“We’re pleased to collaborate with Gcore, a strong European ISV, to advance a networking reference architecture for AI clouds. Combining Nokia’s open, programmable, and reliable networking with Gcore’s cloud software accelerates deployable blueprints that customers can adopt across data centers and the edge.”— Mark Vanderhaegen, Head of Business Development, Data Center Networks, NokiaKey features of AI Cloud StackCloudification of GPU clusters: Transform raw infrastructure into cloud-like consumption: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), GPU as a Service (GPUaaS), or Model as a Service (MaaS).Gcore AI suite integration: Enable serverless inference and training capabilities through Gcore’s enterprise AI suite.Hyperscaler functionality: Built-in billing, observability, orchestration, and professional services deliver the tools CSPs and enterprises need to operate—similar to what they’re used to getting on public cloud.White-label options: Deliver capacity under your own brand while relying on Gcore’s proven global cloud backbone.NVIDIA AI Enterprise-ready: Integrate pretrained models, chatbots, and NVIDIA AI blueprints to accelerate time-to-market.The future of AI cloudsWith Gcore AI Cloud Stack, enterprises no longer need to spend years building the operational, technical, and commercial capabilities required to utilize and monetize GPU infrastructure. Instead, they can launch in a few months with a hyperscaler-grade solution designed for today’s AI demands.Whether you’re a cloud service provider, an enterprise investing in AI infrastructure, or a partner looking to accelerate GPU monetization, AI Cloud Stack gives you the speed, scalability, and GTM support you need.Ready to turn your GPU clusters into a fully monetized, production-grade AI cloud? Talk with our AI experts to learn how you can go from bare metal to model-as-a-service in months, not years.Get a customized consultation

Edge AI is your next competitive advantage: highlights from Seva Vayner’s webinar

Edge AI isn’t just a technical milestone. It’s a strategic lever for businesses aiming to gain a competitive advantage with AI.As AI deployments grow more complex and more global, central cloud infrastructure is hitting real-world limits: compliance barriers, latency bottlenecks, and runaway operational costs. The question for businesses isn’t whether they’ll adopt edge AI, but how soon.In a recent webinar with Mobile World Live, Seva Vayner, Gcore’s Product Director of Edge Cloud and AI, made the business case for edge inference as a competitive differentiator. He outlined what it takes to stay ahead in a world where speed, locality, and control define AI success.Scroll on to watch Seva explain why your infrastructure choices now shape your market position later.Location is everything: edge over cloudAI is no longer something globally operating businesses can afford to run from a central location. Regional regulations and growing user expectations mean models must be served as close to the user as possible. This reduces latency, but perhaps more importantly is essential for compliance with local laws.Edge AI also keeps costs down by avoiding costly international traffic routes. When your users are global but your infrastructure isn’t, every request becomes an expensive, high-latency journey across the internet.Edge inference solves three problems at once in an increasingly regionally fragmented AI landscape:Keeps compute near users for low latencyCuts down on international transit for reduced costsHelps companies stay compliant with local lawsPrivate edge: control over convenienceMany businesses started their AI journey by experimenting with public APIs like OpenAI’s. But as companies and their AI use cases mature, that’s not good enough anymore. They need full control over data residency, model access, and deployment architecture, especially in regulated industries or high-sensitivity environments.That’s where private edge deployments come in. Instead of relying on public endpoints and shared infrastructure, organizations can fully isolate their AI environments, keeping data secure and models proprietary.This approach is ideal for healthcare, finance, government, and any sector where data sovereignty and operational security are critical.Optimizing edge AI: precision over powerDeploying AI at the edge requires right-sizing your infrastructure for the models and tasks at hand. That’s both technically smarter and far more cost-effective than throwing maximum power and size at every use case.Making smart trade-offs allows businesses to scale edge AI sustainably by using the right hardware for each use case.AI at the edge helps businesses deliver the experience without the excess. With the control that the edge brings, hardware costs can be cut by using exactly what each device or location requires, reducing financial waste.Final takeawayAs Seva put it, AI infrastructure decisions are no longer just financial; they’re part of serious business strategy. From regulatory compliance to operational cost to long-term scalability, edge inference is already a necessity for businesses that plan to serve AI at scale and get ahead in the market.Gcore offers a full suite of public and private edge deployment options across six continents, integrated with local telco infrastructure and optimized for real-time performance. Learn more about Everywhere Inference, our edge AI solution, or get in touch to see how we can help tailor a deployment model to your needs.Ready to get started? Deploy a model in just three clicks with Gcore Everywhere Inference.Discover Everywhere Inference

From budget strain to AI gain: Watch how studios are building smarter with AI

Game development is in a pressure cooker. Budgets are ballooning, infrastructure and labor costs are rising, and players expect more complexity and polish with every release. All studios, from the major AAAs to smaller indies, are feeling the strain.But there is a way forward. In a recent webinar, Sean Hammond, Territory Manager for the UK and Nordics at Gcore, explained how AI is reshaping game development workflows and how the right infrastructure strategy can reduce costs, speed up production, and create better player experiences.Scroll on to watch key moments from Sean's talk and explore how studios can make AI work for them.Rising costs are threatening game developmentGame revenue has slowed, but development costs continue to rise. Some AAA titles now surpass $100 million in development budgets. The complexity of modern games demands more powerful servers, scalable infrastructure, and larger teams, making the industry increasingly unsustainable.Personnel and infrastructure costs are also climbing. Developers, artists, and QA testers with specialized skills are in high demand, as are technologies like VR, AR, and AI. Studios are also having to invest more in cybersecurity to protect player data, detect cheating, and safeguard in-game economies.AI is revolutionizing GameDev, even without a perfect use caseWhile the perfect use case for AI in gaming may not have been found yet, it’s already transforming how games are built, tested, and personalized.Sean highlighted emerging applications, including:Smarter QA testingAI-driven player personalizationReal-time motion and animationAccelerated environment and character designMultilingual localizationAdaptive game balancingStudios are already applying these technologies to reduce production timelines and improve immersion.The challenge of secure, scalable AI adoptionOf course, AI adoption doesn’t come without its challenges. Chief among them is security. Public models pose risks: no studio wants their proprietary assets to end up training a competitor’s model.The solution? Deploy AI models on infrastructure you trust so you’re in complete control. That’s where Gcore comes in.Gcore Everywhere Inference reduces compute costs and infrastructure bloat by allowing you to deploy only what you need, where you need it.The future of gaming is AI at scaleTo power real-time player experiences, your studio needs to deploy AI globally, close to your users.Gcore Everywhere Inference lets you deploy models worldwide at the edge with minimal latency because data is not routed back to central servers. This means fast, responsive gameplay and a new generation of real-time, AI-driven features.As a company originally built by gamers, we’ve developed AI solutions with gaming studios in mind. Here’s what we offer:Global edge inference for real-time gameplay: Deploy your AI models close to players worldwide, enabling fast, responsive player experiences without routing data to central servers.Full control over AI model deployment and IP protection: Avoid public APIs and retain full ownership of your assets with on-prem options, preventing your proprietary data from being available to competitors.Scalable, cost-efficient infrastructure tailored to gaming workloads: Deploy only what you need to avoid overprovisioning and reduce compute costs without sacrificing performance.Enhanced player retention through AI-driven personalization and matchmaking: Real-time inference powers smarter NPCs and dynamic matchmaking, improving engagement and keeping players coming back for more.Deploy models in 3 clicks and under 10 seconds: Our developer-friendly platform lets you go from trained model to global deployment in seconds. No complex DevOps setup required.Final takeawayAI is advancing game development fast, but only if it’s deployed right. Gcore offers scalable, secure, and cost-efficient AI infrastructure that helps studios create smarter, faster, and more immersive games.Want to see how it works? Deploy your first model in just a few clicks.Check out our blog on how AI is transforming gaming in 2025

How AI-enhanced content moderation is powering safe and compliant streaming

As streaming experiences a global boom across platforms, regions, and industries, providers face a growing challenge: how to deliver safe, respectful, and compliant content delivery at scale. Viewer expectations have never been higher, likewise the regulatory demands and reputational risks.Live content in particular leaves little room for error. A single offensive comment, inappropriate image, or misinformation segment can cause long-term damage in seconds.Moderation has always been part of the streaming conversation, but tools and strategies are evolving rapidly. AI-powered content moderation is helping providers meet their safety obligations while preserving viewer experience and platform performance.In this article, we explore how AI content moderation works, where it delivers value, and why streaming platforms are adopting it to stay ahead of both audience expectations and regulatory pressures.Real-time problems require real-time solutionsHuman moderators can provide accuracy and context, but they can’t match the scale or speed of modern streaming environments. Live streams often involve thousands of viewers interacting at once, with content being generated every second through audio, video, chat, or on-screen graphics.Manual review systems struggle to keep up with this pace. In some cases, content can go viral before it is flagged, like deepfakes that circulated on Facebook leading up to the 2025 Canadian election. In others, delays in moderation result in regulatory penalties or customer churn, like X’s 2025 fine under the EU Digital Services Act for shortcomings in content moderation and algorithm transparency. This has created a demand for scalable solutions that act instantly, with minimal human intervention.AI-enhanced content moderation platforms address this gap. These systems are trained to identify and filter harmful or non-compliant material as it is being streamed or uploaded. They operate across multiple modalities—video frames, audio tracks, text inputs—and can flag or remove content within milliseconds of detection. The result is a safer environment for end users.How AI moderation systems workModern AI moderation platforms are powered by machine learning algorithms trained on extensive datasets. These datasets include a wide variety of content types, languages, accents, dialects, and contexts. By analyzing this data, the system learns to identify content that violates platform policies or legal regulations.The process typically involves three stages:Input capture: The system monitors live or uploaded content across audio, video, and text layers.Pattern recognition: It uses models to identify offensive content, including nudity, violence, hate speech, misinformation, or abusive language.Contextual decision-making: Based on confidence thresholds and platform rules, the system flags, blocks, or escalates the content for review.This process is continuous and self-improving. As the system receives more inputs and feedback, it adapts to new forms of expression, regional trends, and platform-specific norms.What makes this especially valuable for streaming platforms is its low latency. Content can be flagged and removed in real time, often before viewers even notice. This is critical in high-stakes environments like esports, corporate webinars, or public broadcasts.Multi-language moderation and global streamingStreaming audiences today are truly global. Content crosses borders faster than ever, but moderation standards and cultural norms do not. What’s considered acceptable in one region may be flagged as offensive in another. A word that is considered inappropriate in one language might be completely neutral in another. A piece of nudity in an educational context may be acceptable, while the same image in another setting may not be. Without the ability to understand nuance, AI systems risk either over-filtering or letting harmful content through.That’s why high-quality moderation platforms are designed to incorporate context into their models. This includes:Understanding tone, not just keywordsRecognizing culturally specific gestures or idiomsAdapting to evolving slang or coded languageApplying different standards depending on content type or target audienceThis enables more accurate detection of harmful material and avoids false positives caused by mistranslation.Training AI models for multi-language support involves:Gathering large, representative datasets in each languageTeaching the model to detect content-specific risks (e.g., slurs or threats) in the right cultural contextContinuously updating the model as language evolvesThis capability is especially important for platforms that operate in multiple markets or support user-generated content. It enables a more respectful experience for global audiences while providing consistent enforcement of safety standards.Use cases across the streaming ecosystemAI moderation isn’t just a concern for social platforms. It plays a growing role in nearly every streaming vertical, including the following:Live sports: Real-time content scanning helps block offensive chants, gestures, or pitch-side incidents before they reach a wide audience. Fast filtering protects the viewer experience and helps meet broadcast standards.Esports: With millions of viewers and high emotional stakes, esports platforms rely on AI to remove hate speech and adult content from chat, visuals, and commentary. This creates a more inclusive environment for fans and sponsors alike.Corporate live events: From earnings calls to virtual town halls, organizations use AI moderation to help ensure compliance with internal communication guidelines and protect their reputation.Online learning: EdTech platforms use AI to keep classrooms safe and focused. Moderation helps filter distractions, harassment, and inappropriate material in both live and recorded sessions.On-demand entertainment: Even outside of live broadcasts, moderation helps streaming providers meet content standards and licensing obligations across global markets. It also ensures user-submitted content (like comments or video uploads) meets platform guidelines.In each case, the shared goal is to provide a safe and trusted streaming environment for users, advertisers, and creators.Balancing automation with human oversightAI moderation is a powerful tool, but it shouldn’t be the only one. The best systems combine automation with clear review workflows, configurable thresholds, and human input.False positives and edge cases are inevitable. Giving moderators the ability to review, override, or explain decisions is important for both quality control and user trust.Likewise, giving users a way to appeal moderation decisions or report issues ensures that moderation doesn’t become a black box. Transparency and user empowerment are increasingly seen as part of good platform governance.Looking ahead: what’s next for AI moderationAs streaming becomes more interactive and immersive, moderation will need to evolve. AI systems will be expected to handle not only traditional video and chat, but also spatial audio, avatars, and real-time user inputs in virtual environments.We can also expect increased demand for:Personalization, where viewers can set their own content preferencesIntegration with platform APIs for programmatic content governanceCross-platform consistency to support syndicated content across partnersAs these changes unfold, AI moderation will remain central to the success of modern streaming. Platforms that adopt scalable, adaptive moderation systems now will be better positioned to meet the next generation of content challenges without compromising on speed, safety, or user experience.Keep your streaming content safe and compliant with GcoreGcore Video Streaming offers AI Content Moderation that satisfies today’s digital safety concerns while streamlining the human moderation process.To explore how Gcore AI Content Moderation can transform your digital platform, we invite you to contact our streaming team for a demonstration. Our docs provide guidance for using our intuitive Gcore Customer Portal to manage your streaming content. We also provide a clear pricing comparison so you can assess the value for yourself.Embrace the future of content moderation and deliver a safer, more compliant digital space for all your users.Try AI Content Moderation for freeTry AI Content Moderation for free

Deploy GPT-OSS-120B privately on Gcore

OpenAI’s release of GPT-OSS-120B is a turning point for LLM developers. It’s a 120B parameter model trained from scratch, licensed for commercial use, and available with open weights. This is a serious asset for serious builders.Gcore now supports private GPT-OSS-120B deployments via our Everywhere Inference platform. That means you can stand up your own endpoint in minutes, run inference at scale, and control the full stack, without API limits, vendor lock-in, or hidden usage fees. Just fast, secure, controlled deployment on your terms. Deploy now in three clicks or read on to learn more.Why GPT-OSS-120B is big news for buildersThis model changes the game for anyone developing AI apps, platforms, or infrastructure. It brings GPT-3-level reasoning to the open-source ecosystem and frees developers from closed APIs.With GPT-OSS-120B, you get:Full access to model weights and architectureSelf-hosting for maximum data control and privacySupport for fine-tuning and model editingOffline deployment for secure or air-gapped useMassive cost savings at scaleYou can deploy in any Gcore region (or leverage Gcore’s three-click serverless inference on your own infrastructure), route traffic through your own stack, and fully control load, latency, and logs. This is LLM deployment for real-world apps, not just playground prompts.How to deploy GPT-OSS-120B with Gcore Everywhere InferenceGcore Everywhere Inference gives you a clean path from open model to production endpoint. You can spin up a dedicated deployment in just three clicks. We offer configuration options to suit your business needs:Choose your location (cloud or on-prem)Integrate via standard APIs (OpenAI-compatible)Control usage, autoscale, and costsDeploying GPT-OSS-120B on Gcore takes just three clicks in the Gcore Customer Portal.There are no shared endpoints. You get dedicated compute, low-latency routing, and full control and observability.You can also bring your own trained variant if you’ve fine-tuned GPT-OSS-120B elsewhere. We’ll help you host it reliably, close to your users.Use cases: where GPT-OSS-120B fits bestCommercial GPTs still outperform OSS models on some general tasks, but GPT-OSS-120B gives you control, portability, and flexibility where it counts. Most importantly, it gives you the ability to build privacy-sensitive applications.Great fits include:Internal dev tools and copilotsRetrieval-augmented generation (RAG) pipelinesSecure, private enterprise assistantsData-sensitive, on-prem AI workloadsModels requiring full customization or fine-tuningIt’s especially relevant for finance, healthcare, government, and legal teams operating under strict compliance rules.Deploy GPT-OSS-120B todayWant to learn more about GPT-OSS-120B and why Gcore is an ideal provider for deployment? Get all the information you need on our dedicated page.And if you’re ready to deploy in just three clicks, head on over to the Gcore Customer Portal. GPT-OSS-120B is waiting for you in the Application Catalog.Learn more about deploying GPT-OSS-120B on Gcore

Announcing new tools, apps, and regions for your real-world AI use cases

Three updates, one shared goal: helping builders move faster with AI. Our latest releases for Gcore Edge AI bring real-world AI deployments within reach, whether you’re a developer integrating genAI into a workflow, an MLOps team scaling inference workloads, or a business that simply needs access to performant GPUs in the UK.MCP: make AI do moreGcore’s MCP server implementation is now live on GitHub. The Model Context Protocol (MCP) is an open standard, originally developed by Anthropic, that turns AI models into agents that can carry out real-world tasks. It allows you to plug genAI models into everyday tools like Slack, email, Jira, and databases, so your genAI can read, write, and reason directly across systems. Think of it as a way to turn “give me a summary” into “send that summary to the right person and log the action.”“AI needs to be useful, not just impressive. MCP is a critical step toward building AI systems that drive desirable business outcomes, like automating workflows, integrating with enterprise tools, and operating reliably at scale. At Gcore, we’re focused on delivering that kind of production-grade AI through developer-friendly services and top-of-the-range infrastructure that make real-world deployment fast and easy.” — Seva Vayner, Product Director of Edge Cloud and AI, GcoreTo get started, clone the repo, explore the toolsets, and test your own automations.Gcore Application Catalog: inference without overheadWe’ve upgraded the Gcore Model Catalog into something even more powerful: an Application Catalog for AI inference. You can still deploy the latest open models with three clicks. But now, you can also tune, share, and scale them like real applications.We’ve re-architected our inference solution so you can:Run prefill and decode stages in parallelShare KV cache across pods (it’s not tied to individual GPUs) from August 2025Toggle WebUI and secure API independently from August 2025These changes cut down on GPU memory usage, make deployments more flexible, and reduce time to first token, especially at scale. And because everything is application-based, you’ll soon be able to optimize for specific business goals like cost, latency, or throughput.Here’s who benefits:ML engineers can deploy high-throughput workloads without worrying about memory overheadBackend developers get a secure API, no infra setup neededProduct teams can launch demos instantly with the WebUI toggleInnovation labs can move from prototype to production without reconfiguringPlatform engineers get centralized caching and predictable scalingThe new Application Catalog is available now through the Gcore Customer Portal.Chester data center: NVIDIA H200 capacity in the UKGcore’s newest AI cloud region is now live in Chester, UK. This marks our first UK location in partnership with Northern Data. Chester offers 2000 NVIDIA H200 GPUs with BlueField-3 DPUs for secure, high-throughput compute on Gcore GPU Cloud, serving your training and inference workloads. You can reserve your H200 GPU immediately via the Gcore Customer Portal.This launch solves a growing problem: UK-based companies building with AI often face regional capacity shortages, long wait times, or poor performance when routing inference to overseas data centers. Chester fixes that with immediate availability on performant GPUs.Whether you’re training LLMs or deploying inference for UK and European users, Chester offers local capacity, low latency, and impressive capacity and availability.Next stepsExplore the MCP server and start building agentic workflowsTry the new Application Catalog via the Gcore Customer PortalDeploy your workloads in Chester for high-performance UK-based computeDeploy your AI workload in three clicks today!