Home
Blog
Business Benefits of AI Inference at the Edge

Business Benefits of AI Inference at the Edge

March 15, 2024

6 min read

Business Benefits of AI Inference at the Edge

Transitioning AI inferencing from the cloud to the edge enhances real-time decision making by bringing data processing closer to data sources. For businesses, this shift significantly reduces latency, directly enhancing user experience by enabling near-instant content delivery and real-time interaction. This article explores edge AI’s business benefits across various industries and applications, emphasizing the importance of immediate data analysis in driving business success.

How Does AI Inference at the Edge Impact Businesses?

Deploying AI models at the edge means that during AI inference, data is processed on-site or near the user, enabling real-time, near-instant data processing and decision making. AI inference is the process of applying a trained model’s knowledge to new, unseen data, becomes significantly more efficient at the edge. Low-latency inference provided by edge AI is essential for businesses that rely on up-to-the-moment data analysis to inform decisions, improve customer experiences, and maintain a competitive edge.

How It Works: Edge vs Cloud

Inference at the edge removes the delays that characterize the transmission of information to distant cloud servers in the traditional model that preceded edge inference. It does so by reducing the physical distance between the device requesting AI inference, and the server where inference is performed. This enables applications to respond to changes or inputs almost instantaneously.

Benefits of AI Inference at the Edge

Shifting to edge AI offers significant benefits for businesses across industries. (In the next section, we’ll look at industry-specific benefits and use cases.)

Real-Time Data Processing

Edge AI transforms business operations by enabling data to be processed almost instantly at or near its source, crucial for sectors where time is of the essence, like gaming, healthcare, and entertainment. This technology dramatically reduces the time lag between data collection and analysis, providing immediate actionable information and allowing businesses to gain real-time insights, make swift decisions, and optimize operations.

Bandwidth Efficiency

By processing data locally, edge AI minimizes the volume of data that needs to be transmitted across networks. This reduction in data transmission alleviates network congestion and improves system performance, critical for environments with high data traffic.

For businesses, this means operations remain uninterrupted and responsive even at peak times and without needing to implement costly network upgrades. This directly translates into tangible financial savings combined with more reliable service delivery for their customers—a win-win scenario from inference at the edge.

Reduced Costs

Edge AI helps businesses minimize the need for frequent data transfers to cloud services, which substantially lowers bandwidth, infrastructure, and storage needs for extensive data management. As a result, this approach makes the entire data-handling process more cost-efficient.

Accessibility and Reliability

Edge AI’s design allows for operation even without consistent internet access by deploying AI applications on local devices, without needing to connect to distant servers. This ensures stable performance and dependability, enabling businesses to maintain high service standards and operational continuity, regardless of geographic or infrastructure constraints.

Enhanced Privacy and Security

Despite spending copious amounts of time and sharing experiences on platforms like TikTok and X, today’s users are increasingly privacy-conscious. There’s good reason for this, as data breaches are on the rise, costing organizations of all sizes millions and compromising individuals’ data. For example, the widely publicized T-Mobile breach in 2022 resulted in $350 million in customer payouts. Companies providing AI-driven capabilities have a strong hold on user engagement and generally promise users control over how models are used, respecting privacy and content ownership. Taking AI data to the edge can contribute to such privacy efforts.

Edge AI’s local data processing means that data analysis can occur directly on the device where data is collected, rather than being sent to remote servers. This proximity significantly reduces the risk of data interception or unauthorized access, as less data is transmitted over networks.

Processing data locally—either on individual devices or a nearby server—makes adherence to privacy regulations and security protocols like GDPR easier. Such regulations require that sensitive data be kept within specific regions. Edge AI achieves this high level of compliance by enabling companies to process data within the same region or country where it’s generated.

For example, a global AI company could have a French user’s data processed by a French edge AI server, and a Californian user’s data processed by a server located in California. This way, the data processing of the two users would automatically adhere to their local laws: the French user’s would be performed in accordance with the European standard GDPR, and the Californian’s according to CCPA and CPRA.

How Edge AI Meets Industries’ Low-Latency Data Processing Needs

While edge AI presents significant advantages across industries, its adoption is more critical in some use cases than others, particularly those that require speed and efficiency to gain and maintain a competitive advantage. Let’s look at some industries where inference at the edge is particularly crucial.

Entertainment

In the entertainment industry, edge AI is allowing providers to offer highly personalized content and interactive features directly to users. It enables significant added value in the form of live sports updates, in-context player information, interactive movie features, real-time user preference analysis, and tailored recommendation generation by optimizing bandwidth usage and cutting out the lag time linked to using remote servers. These capabilities promote enhanced viewer engagement and a more immersive and satisfying entertainment experience.

GenAI

Imagine a company that revolutionizes personalized content by enabling users to generate beautiful, customized images through artificial intelligence, integrating personal elements like pictures they’ve taken of themselves, products, pets, or other personal items. Applications like these already exist.

Today’s users expect immediate responses in their digital interactions. To keep its users engaged and excited, such a company must find ways to meet its users’ expectations or risk losing them to competitors.

The local processing of this entertainment-geared data to prompt image generation tightens its security, as sensitive information doesn’t have to travel over the internet to distant servers. Additionally, by processing user requests directly on devices or nearby servers, edge AI can minimize delays in image generation, making the experience of customizing images fast and allowing for real-time interaction with the application. The result: a deeper, more satisfying connection between users and the technology.

Manufacturing

In manufacturing, edge AI modernizes predictive maintenance and quality control by bringing intelligent processing capabilities right to the factory floor. This allows for real-time monitoring of equipment, leveraging advanced machine vision and the continuous and detailed analysis of vibration, temperature, and acoustic data from machinery to detect quality deviations. The practical impact is a reduction in defects and reduction in downtime via predictive maintenance. Inference at the edge allows the real-time response that’s required for this.

Major companies have already adopted edge AI in this way. For instance, Procter & Gamble’s chemical mix tanks are monitored by edge AI solutions that immediately notify floor managers of quality deviations, preventing flawed products from continuing down the manufacturing line. Similarly, BMW employs a combination of edge computing and AI to achieve a real-time overview of its assembly lines, ensuring the efficiency and safety of its manufacturing operations.

Manufacturing applications of inference at the edge significantly reduce operational costs by optimizing equipment maintenance and quality control. The technology’s ability to process data on-site or nearby transforms traditional manufacturing into a highly agile, cost-effective, and reliable operation, setting a new benchmark for the industry worldwide.

Healthcare

In healthcare, AI inference at the edge addresses critical concerns, such as privacy and security, through stringent data encryption and anonymization techniques, ensuring patient data remains confidential. Edge AI’s compatibility with existing healthcare IT systems, achieved through interoperable standards and APIs, enables seamless integration with current infrastructures. Overall, the impact of edge AI on healthcare is improved care delivery via the enabling of immediate, informed medical decisions based on real-time data insights.

Gcore partnered with a healthcare provider who needed to process sensitive medical data to generate an AI second opinion, particularly in oncological cases. Due to patient confidentiality, the data couldn’t leave the country. As such, the healthcare provider’s best option to meet regulatory compliance while maintaining high performance was to deploy an edge solution connected to their internal system and AI model. With 160+ strategic global locations and proven adherence to GDPR and ISO 27001 standards, we were able to offer the healthcare provider the edge AI advantage they needed.

The result:

Real-time processing and reduced latency: For the healthcare provider, every second counts, especially in critical oncological cases. By deploying a large model at the edge, close to the hospital’s headquarters, we enabled fast insights and responses.
Enhanced security and privacy: Maintaining the integrity and confidentiality of patient data was a non-negotiable in this case. By processing the data locally, we ensured adherence to strict privacy standards like GDPR, without sacrificing performance.
Efficiency and cost reduction: We minimized bandwidth usage by reducing the need for constant data transmission to distant servers—critical for rapid and reliable data turnover—while minimizing the associated costs.

Retail

In retail, edge AI brings precision to inventory management and personalizes the customer experience across a variety of operations. By analyzing data from sensors and cameras in real-time, edge AI predicts restocking needs accurately, ensuring that shelves are always filled with the right products. This technology also powers smart checkout systems, streamlining the purchasing process by eliminating the need for manual scanning, thus reducing wait times and improving customer satisfaction. Retail chatbots and AI customer service bring these benefits to e-commerce.

Inference at the edge make it possible to employ computer vision to understand customer behaviors and preferences in real time, enabling retailers to optimize store layouts and product placements effectively. This insight helps to create a shopping environment that encourages purchases and enhances the overall customer journey. Retailers leveraging edge AI can dynamically adjust to consumer trends and demands, making operations more agile and responsive.

Conclusion

AI inferencing at the edge offers businesses across various industries the ability to process data in real time, directly at the source. This capability reduces latency while enhancing operational efficiency, security, and customer satisfaction, allowing businesses to set a new standard in leveraging technology to gain a competitive advantage.

Gcore is at the forefront of this technological evolution, activating AI inference at the edge across a global network designed to minimize latency and maximize performance. With advanced L40S GPU-based computing resources and a comprehensive list of open-source models, Gcore Edge AI provides a robust, cutting-edge platform for large AI model deployment.

Explore Gcore AI GPU Cloud Infrastructure

Gcore Team

Content Team

Gcore partners with AVEQ to elevate streaming performance monitoring

At Gcore, delivering exceptional streaming experiences to users across our global network is at the heart of what we do. We're excited to share how we're taking our CDN performance monitoring to new heights through our partnership with AVEQ and their innovative Surfmeter solution.Operating a massive global network spanning 210 points of presence across six continents comes with unique challenges. While our globally distributed caching infrastructure already ensures optimal performance for end-users, we recognized the need for deeper insights into the complex interactions between applications and our underlying network. We needed to move beyond traditional server-side monitoring to truly understand what our customers' users experience in the real world.Real-world performance visibilityThat's where AVEQ's Surfmeter comes in. We're now using Surfmeter to gain unprecedented visibility into our network performance through automated, active measurements that simulate actual streaming video quality, exactly as end-users experience it.This isn't about checking boxes or reviewing server logs. It's about measuring what users see on their screens at home. With Surfmeter, our engineering teams can identify and resolve potential bottlenecks or suboptimal configurations, and collaborate more effectively with our customers to continuously improve Quality of Experience (QoE).How we use SurfmeterAVEQ helps us simulate and analyze real-world scenarios where users access different video streams. Their software runs both on network nodes close to our data center CDN caches and at selected end-user locations with genuine ISP connections.What sets Surfmeter apart is its authentic approach: it opens video streams from the same platforms and players that end-users rely on, ensuring measurements truly represent real-world conditions. Unlike monitoring solutions that simply check stream availability, Surfmeter doesn't make assumptions or use third-party playback engines. Instead, it precisely replicates how video players request and decode data served from our CDN.Rapid issue resolutionWhen performance issues arise, Surfmeter provides our engineers with the deep insights needed to quickly identify root causes. Whether the problem lies within our CDN, with peering partners, or on the server side, we can pinpoint it with precision.By monitoring individual video requests, including headers and timings, and combining this data with our internal logging, we gain complete visibility and observability into our entire streaming pipeline. Surfmeter can also perform ping and traceroute tests from the same device, measuring video QoE, allowing our engineers to access all collected data through one API rather than manually connecting to devices for troubleshooting.Competitive benchmarking and future capabilitiesSurfmeter also enables us to benchmark our performance against other services and network providers. By deploying Surfmeter probes at customer-like locations, we can measure streaming from any source via different ISPs.This partnership reflects our commitment to transparency and data-driven service excellence. By leveraging AVEQ's Surfmeter solution, we ensure that our customers receive the best possible streaming performance, backed by objective, end-user-centric insights.Learn more about Gcore CDN

How we engineered a single pipeline for LL-HLS and LL-DASH

Viewers in sports, gaming, and interactive events expect real-time, low-latency streaming experiences. To deliver this, the industry has rallied around two powerful protocols: Low-Latency HLS (LL-HLS) and Low-Latency DASH (LL-DASH).While they share a goal, their methods are fundamentally different. LL-HLS delivers video in a sequence of tiny, discrete files. LL-DASH delivers it as a continuous, chunked download of a larger file. This isn't just a minor difference in syntax; it implies completely different behaviors for the packager, the CDN, and the player.This duality presents a major architectural challenge: How do you build a single, efficient, and cost-effective pipeline that can serve both protocols simultaneously from one source?At Gcore, we took on this unification problem. The result is a robust, single-source pipeline that delivers streams with a glass-to-glass latency of approximately 2.0 seconds for LL-DASH and 3.0 seconds for LL-HLS. This is the story of how we designed it.Understanding the dualityTo build a unified system, we first had to deeply understand the differences in how each protocol operates at the delivery level.LL-DASH: the continuous feedMPEG-DASH has always been flexible, using a single manifest file to define media segments by their timing. Low-Latency DASH builds on this by using Chunked CMAF segments.Imagine a file that is still being written to on the server. Instead of waiting for the whole file to be finished, the player can request it, and the server can send it piece by piece using Chunked Transfer Encoding. The player receives a continuous stream of bytes and can start playback as soon as it has enough data.Single, long-lived files: A segment might be 2–6 seconds long, but it’s delivered as it’s being generated.Timing-based requests: The player knows when a segment should be available and requests it. The server uses chunked transfer to send what it has so far.Player-driven latency: The manifest contains a targetLatency attribute, giving the player a strong hint about how close to the live edge it should play.LL-HLS: the rapid-fire deliveryLL-HLS takes a different approach. It extends the traditional playlist-based HLS by breaking segments into even smaller chunks called Parts.Think of it like getting breaking news updates. The server pre-announces upcoming Parts in the manifest before they are fully available. The player then requests a Part, but the server holds that request open until the Part is ready to be delivered at full speed. This is called a Blocking Playlist Reload.Many tiny files (Parts): A 2-second segment might be broken into four 0.5-second Parts, each requested individually.Manifest-driven updates: The server constantly updates the manifest with new Parts, and uses special tags like #EXT-X-PART-INF and #EXT-X-SERVER-CONTROL to manage delivery.Server-enforced timing: The server controls when the player receives data by holding onto requests, which helps synchronize all viewers.A simplified diagram visually comparing the LL-HLS delivery of many small parts versus the LL-DASH chunked transfer of a single, larger segment over the same time period.These two philosophies demand different things from a CDN. LL-DASH requires the CDN to intelligently cache and serve partially complete files. LL-HLS requires the CDN to handle a massive volume of short, bursty requests and hold connections open for manifest updates. A traditional CDN is optimized for neither.Forging a unified strategyWith two different delivery models, where do you start? You find the one thing they both depend on: the keyframe.Playback can only start from a keyframe (or I-frame). Therefore, the placement of keyframes, which defines the Group of Pictures (GOP), is the foundational layer that both protocols must respect. By enforcing a consistent keyframe interval on the source stream, we could create a predictable media timeline. This timeline can then be described in two different “languages” in the manifests for LL-HLS and LL-DASH.A single timeline with consistent GOPs being packaged for both protocols.This realization led us to a baseline configuration, but each parameter involved a critical engineering trade-off:GOP: 1 second. We chose a frequent, 1-second GOP. The primary benefit is extremely fast stream acquisition; a player never has to wait more than a second for a keyframe to begin playback. The trade-off is a higher bitrate. A 1-second GOP can increase bitrate by 10–15% compared to a more standard 2-second GOP because you're storing more full-frame data. For real-time, interactive use cases, we prioritized startup speed over bitrate savings.Segment Size: 2 seconds. A 2-second segment duration provides a sweet spot. For LL-DASH and modern HLS players, it's short enough to keep manifest sizes manageable. For older, standard HLS clients, it prevents them from falling too far behind the live edge, keeping latency reduced even on legacy players.Part Size: 0.5 seconds. For LL-HLS, this means we deliver four updates per segment. This frequency is aggressive enough to achieve sub-3-second latency while being coarse enough to avoid overwhelming networks with excessive request overhead, which can happen with part durations in the 100–200ms range.Cascading challenges through the pipeline1. Ingest: predictability is paramountTo produce a clean, synchronized output, you need a clean, predictable input. We found that the encoder settings of the source stream are critical. An unstable source with a variable bitrate or erratic keyframe placement will wreck any attempt at low-latency delivery.For our users, we recommend settings that prioritize speed and predictability over compression efficiency:Rate control: Constant Bitrate (CBR)Keyframe interval: A fixed interval (e.g., every 30 frames for 30 FPS, to match our 1s GOP).Encoder tune: zerolatencyAdvanced options: Disable B-frames (bframes=0) and scene-cut detection (scenecut=0) to ensure keyframes are placed exactly where you command them to be.Here is an example ffmpeg command in Bash that encapsulates these principles:ffmpeg -re -i "source.mp4" -c:a aac -c:v libx264 \ -profile:v baseline -tune zerolatency -preset veryfast \ -x264opts "bframes=0:scenecut=0:keyint=30" \ -f flv "rtmp://your-ingest-url"2. Transcoding and packagingOur transcoding and Just-In-Time Packaging (JITP) layer is where the unification truly happens. This component does more than just convert codecs; it has to operate on a stream that is fundamentally incomplete.The primary challenge is that the packager must generate manifests and parts from media files that are still being written by the transcoder. This requires a tightly-coupled architecture where the packager can safely read from the transcoder's buffer.To handle the unpredictable nature of live sources, especially user-generated content via WebRTC, we use a hybrid workflow:GPU Workers (Nvidia/Intel): These handle the heavy lifting of decoding and encoding. Offloading to GPU hardware is crucial for minimizing processing latency and preserving advanced color formats like HDR+.Software Workers and Filters: These provide flexibility. When a live stream from a mobile device suddenly changes resolution or its framerate drops due to a poor connection, a rigid hardware pipeline would crash. Our software layer can handle these context changes gracefully, for instance, by scaling the erratic source and overlaying it on a stable, black-background canvas, meaning the output stream never stops.This makes our JITP a universal packager, creating three synchronized content types from a single, resilient source:LL-DASH (CMAF)LL-HLS (CMAF)Standard HLS (MPEG-TS) for backward compatibility3. CDN delivery: solving two problems at onceThis was the most intensive part of the engineering effort. Our CDN had to be taught how to excel at two completely different, high-performance tasks simultaneously.For LL-DASH, we developed a custom caching module we call chunked-proxy. When the first request for a new .m4s segment arrives, our edge server requests it from the origin. As bytes flow in from the origin, the chunked-proxy immediately forwards them to the client. When a second client requests the same file, our edge server serves all the bytes it has already cached and then appends the new bytes to both clients' streams simultaneously. It’s a multi-client cache for in-flight data.For LL-HLS, the challenges were different:Handling Blocked Requests: Our edge servers needed to be optimized to hold thousands of manifest requests open for hundreds of milliseconds without consuming excessive resources.Intelligent Caching: We needed to correctly handle cache statuses (MISS, EXPIRED) for manifests to ensure only one request goes to the origin per update, preventing a "thundering herd" problem.High Request Volume: LL-HLS generates a storm of requests for tiny part-files. Our infrastructure was scaled and optimized to serve these small files with minimal overhead.The payoff: ultimate flexibility for developersThis engineering effort wasn't just an academic exercise. It provides tangible benefits to developers building with our platform. The primary benefit is simplicity through unification, but the most powerful benefit is the ability to optimize for every platform.Consider the complex landscape of Apple devices. With our unified pipeline, you can create a player logic that does this:On iOS 17.1+: Use LL-DASH with the new Managed Media Source (MMS) API for ~2.0 second latency.On iOS 14.0 - 17.0: Use native LL-HLS for ~3.0 second latency.On older iOS versions: Automatically fall back to standard HLS with a reduced latency of ~9 seconds.This lets you provide the best possible experience on every device, all from a single backend and a single live source, without any extra configuration.Don't fly blind: observability in a low-latency worldA complex system is useless without visibility, and traditional metrics can be misleading for low-latency streaming. Simply looking at response_time from a CDN log is not enough.We had to rethink what to measure. For example:For an LL-HLS manifest, a high response_time (e.g., 500ms) is expected behavior, as it reflects the server correctly holding the request while waiting for the next part. A low response_time could actually indicate a problem. We monitor “Manifest Hold Time” to ensure this blocking mechanism is working as intended.For LL-DASH, a player requesting a chunk that isn't ready yet might receive a 404 Not Found error. While occasional 404s are normal, a spike can indicate origin-to-edge latency issues. This metric, combined with monitoring player liveCatchup behavior, gives a true picture of stream health.Gcore: one pipeline to serve them allThe paths of LL-HLS and LL-DASH may be different, but their destination is the same: real-time interaction with a global audience. By starting with a common foundation—the keyframe—and custom-engineering every component of our pipeline to handle this duality, we successfully solved the unification problem.The result is a single, robust system that gives developers the power of both protocols without the complexity of running two separate infrastructures. It’s how we deliver ±2.0s latency with LL-DASH and ±3.0s with LL-HLS, and it’s the foundation upon which we’ll build to push the boundaries of real-time streaming even further.

Gcore successfully stops 6 Tbps DDoS attack

Gcore recently detected and mitigated one of the most powerful distributed denial-of-service (DDoS) attacks of the year, peaking at 6 Tbps and 5.3 billion packets per second (Bpps).This surge, linked to the AISURU botnet, reflects a growing trend of large-scale attacks. It reminds us how crucial effective protection has become for companies that depend on high availability and low latency. 6 Tbps 5.3 BppsThe attack in numbersPeak traffic: 6 TbpsPacket rate: 5.3 BppsMain protocol: UDP, typical of volumetric floods designed to overwhelm bandwidthGeographic concentration: 51% of sources originated in Brazil and 23.7% in the US, together accounting for nearly 75% of all trafficGeographic sources This regional concentration shows how today’s botnets are expanding across areas with high device connectivity and weaker security measures, creating an ideal environment for mass exploitation.How to strengthen your defensesThe 6 Tbps attack is not an isolated incident. It marks an escalation in DDoS activity across industries where performance and availability are critical to customer satisfaction and company revenue. To protect your business from large-scale DDoS attacks, consider the following key strategies:Adopt an adaptive DDoS protection that detects and mitigates attacks automatically.Leverage edge infrastructure to absorb malicious traffic closer to its source.Prepare for high traffic volumes by upgrading your infrastructure or partnering with a reliable DDoS protection provider that has the global capacity and resources to keep your services online during large-scale attacks.Keeping your business safe with GcoreTo stay ahead of these evolving threats, companies need solutions that deliver real-time detection, intelligent mitigation, and global reach. Gcore’s DDoS Protection was built to do precisely that, leveraging AI-driven traffic analysis and worldwide network capacity to block attacks before they impact your users.As attacks grow larger and more complex, staying resilient means being prepared. With the right protection in place, your customers will never know an attack happened in the first place.Learn more about 2025 cyberattack trends

Gcore CDN updates: Dedicated IP and BYOIP now available

We’re pleased to announce two new premium features for Gcore CDN: Dedicated IP and Bring Your Own IP (BYOIP). These capabilities give customers more control over their CDN configuration, helping you meet strict security, compliance, and branding requirements.Many organizations, especially in finance and other regulated sectors, require full control over their network identity. With these new features, Gcore enables customers to use unique, dedicated IP addresses to meet compliance or security standards; retain ownership and visibility over IP reputation and routing, and deliver content globally while maintaining trusted, verifiable IP associations.Read on for more information about the benefits of both updates.Dedicated IP: exclusive addresses for your CDN resourcesThe Dedicated IP feature enables customers to assign a private IP address to their CDN configuration, rather than using shared ones. This is ideal for:Businesses that are subject to strict security or legal frameworks.Customers who want isolated IP resources to ensure consistent access and reputation.Teams using WAAP or other advanced security solutions where dedicated IPs simplify policy management.BYOIP: bring your own IP range to Gcore CDNWith Bring Your Own IP (BYOIP), customers can use their own public IP address range while leveraging the performance and global reach of Gcore CDN. This option is especially useful for:Resellers who prefer to keep Gcore infrastructure invisible to end clients.Enterprises maintaining brand consistency and control over IP reputation.How to get startedBoth features are currently available as paid add-ons and are configured manually by the Gcore team. To request activation or learn more, please contact Gcore Support or your account manager.We’re working on making these features easier to manage and automate in future releases. As always, we welcome your feedback on both the feature functionality and the request process—your insights help us improve the Gcore CDN experience for everyone.Get in touch for more information

Introducing AI Cloud Stack: turning GPU clusters into revenue-generating AI clouds

Enterprises and cloud providers face major roadblocks when trying to deploy GPU infrastructure at scale: long time-to-market, operational inefficiencies, and difficulty bringing new capacity to market profitably. Establishing AI environments with hyperscaler-grade functionality typically requires years of engineering effort, multiple partner integrations, and complex operational tooling.Not anymore.With Gcore AI Cloud Stack, organizations can transform bare Nvidia GPU clusters into a fully cloud-enabled environment—complete with orchestration, observability, billing, and go-to-market support—all in a fraction of the time it would take to build from scratch, maximizing GPU utilization.This proven solution marks the latest addition to the Gcore AI product suite, enabling enterprises and cloud providers to accelerate AI cloud deployment through better GPU utilization, monetization, reduced complexity, and hyperscaler-grade functionality in their own AI environments. Gcore AI Cloud Stack is already powering leading technology providers, including VAST and Nokia.Why we built AI Cloud StackBuying and efficiently operating GPUs at a large scale requires significant investment, time, and expertise. Most organizations need to hit the ground running, bypassing years of in-house R&D. Without a robust reference architecture, infrastructure and network preparation, 24/7 monitoring, dynamic resource allocation, orchestration abstraction, and clear paths to utilization or commercialization, enterprises can spend years before seeing ROI.“Gcore brings together the key pieces—compute, networking, and storage—into a usable stack. That integration helps service providers stand up AI clouds faster and onboard clients sooner, accelerating time to revenue. Combined with the advanced multi-tenant capabilities of VAST’s AI Operating System, it delivers a reliable, scalable, and futureproof AI infrastructure. Gcore offers operators a valuable option to move quickly without building everything themselves.”— Dan Chester, CSP Director EMEA, VAST DataAt Gcore, we understand that organizations across industries will continue to invest heavily in GPUs to power the next wave of AI innovation—meaning these challenges aren’t going away. AI Cloud Stack solves today’s challenges and anticipates tomorrow’s. It ensures that GPU infrastructure at the core of AI innovation delivers maximum value to enterprises.How AI Cloud Stack worksThis comprehensive solution is structured across three stages.1. Provision and launchGcore handles the complexities of initial deployment, from physical infrastructure setup to orchestration, enabling enterprises to go live quickly with a reliable GPU cloud.2. Operations and managementThe solution includes monitoring, orchestration, ticket management, and ongoing support to keep environments stable, secure, and efficient. This includes automated GPU failure handling and optimized resource management.3. Go-to-market supportUnlike other solutions, AI Cloud Stack goes beyond infrastructure. Building on Gcore’s experience as a trusted NVIDIA Cloud Provider (NCP), it helps customers sell their capacity, including through established reseller channels. This integrated GTM support ensures capacity doesn’t sit idle, losing value and potential.What sets Gcore apartUnlike many providers entering this market, Gcore has operated as a global cloud provider for over a decade and has been an early player in the global AI landscape. Gcore knows what it takes to build, scale, and sell cloud and AI services—because it has done it for customers and partners worldwide. Gcore AI Cloud Stack has already been deployed on thousands of NVIDIA Hopper GPUs across Europe to build a commercial-grade AI cloud with full orchestration, abstraction, and monetization layers. That real-world experience allows Gcore to deliver the infrastructure, operational playbook, and sales enablement customers need to succeed.“We’re pleased to collaborate with Gcore, a strong European ISV, to advance a networking reference architecture for AI clouds. Combining Nokia’s open, programmable, and reliable networking with Gcore’s cloud software accelerates deployable blueprints that customers can adopt across data centers and the edge.”— Mark Vanderhaegen, Head of Business Development, Data Center Networks, NokiaKey features of AI Cloud StackCloudification of GPU clusters: Transform raw infrastructure into cloud-like consumption: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), GPU as a Service (GPUaaS), or Model as a Service (MaaS).Gcore AI suite integration: Enable serverless inference and training capabilities through Gcore’s enterprise AI suite.Hyperscaler functionality: Built-in billing, observability, orchestration, and professional services deliver the tools CSPs and enterprises need to operate—similar to what they’re used to getting on public cloud.White-label options: Deliver capacity under your own brand while relying on Gcore’s proven global cloud backbone.NVIDIA AI Enterprise-ready: Integrate pretrained models, chatbots, and NVIDIA AI blueprints to accelerate time-to-market.The future of AI cloudsWith Gcore AI Cloud Stack, enterprises no longer need to spend years building the operational, technical, and commercial capabilities required to utilize and monetize GPU infrastructure. Instead, they can launch in a few months with a hyperscaler-grade solution designed for today’s AI demands.Whether you’re a cloud service provider, an enterprise investing in AI infrastructure, or a partner looking to accelerate GPU monetization, AI Cloud Stack gives you the speed, scalability, and GTM support you need.Ready to turn your GPU clusters into a fully monetized, production-grade AI cloud? Talk with our AI experts to learn how you can go from bare metal to model-as-a-service in months, not years.Get a customized consultation

Gcore Radar Q1–Q2 2025: three insights into evolving attack trends

Cyberattacks are becoming more frequent, larger in scale, and more sophisticated in execution. For businesses across industries, this means protecting digital resources is more important than ever. Staying ahead of attackers requires not only robust defense solutions but also a clear understanding of how attack patterns are changing.The latest edition of the Gcore Radar report, covering the first half of 2025, highlights important shifts in attack volumes, industry targets, and attacker strategies. Together, these findings show how the DDoS landscape is evolving, and why adaptive defense has never been more important.Here are three key insights from the report, which you can download in full here.#1. DDoS attack volumes continue to riseIn Q1–Q2 2025, the total number of DDoS attacks grew by 21% compared to H2 2024 and 41% year-on-year.The largest single attack peaked at 2.2 Tbps, surpassing the previous record of 2 Tbps in late 2024.The growth is driven by several factors, including the increasing availability of DDoS-for-hire services, the rise of insecure IoT devices feeding into botnets, and heightened geopolitical and economic tensions worldwide. Together, these factors make attacks not only more common but also harder to mitigate.#2. Technology overtakes gaming as the top targetThe distribution of attacks by industry has shifted significantly. Technology now represents 30% of all attacks, overtaking gaming, which dropped from 34% in H2 2024 to 19% in H1 2025. Financial services remain a prime target, accounting for 21% of attacks.This trend reflects attackers’ growing focus on industries with broader downstream impact. Hosting providers, SaaS platforms, and payment systems are attractive targets because a single disruption can affect entire ecosystems of dependent businesses.#3. Attacks are getting smarter and more complexAttackers are increasingly blending high-volume assaults with application-layer exploits aimed at web apps and APIs. These multi-layered tactics target customer-facing systems such as inventory platforms, payment flows, and authentication processes.At the same time, attack durations are shifting. While maximum duration has shortened from five hours to three, mid-range attacks lasting 10–30 minutes have nearly quadrupled. This suggests attackers are testing new strategies designed to bypass automated defenses and maximize disruption.How Gcore helps businesses stay protectedAs attack methods evolve, businesses need equally advanced protection. Gcore DDoS Protection offers over 200 Tbps filtering capacity across 210+ points of presence worldwide, neutralizing threats in real time. Integrated Web Application and API Protection (WAAP) extends defense beyond network perimeters, protecting against sophisticated application-layer and business-logic attacks. To explore the report’s full findings, download the complete Gcore Radar report here.Download Gcore Radar Q1-Q2 2025