Gaming industry under DDoS attack. Get DDoS protection now. Start onboarding
  1. Home
  2. Blog
  3. Mili Leitner Cohen

Mili Leitner Cohen

Content Marketing Lead, AI Products

Mili leads content marketing for the Gcore AI product team. She helps define how AI products are positioned, launched, and communicated globally. With more than a decade of experience in content and growth strategy, she makes complex innovation understandable and engaging for real audiences.

Introducing Gcore Everywhere AI: 3-click AI training and inference for any environment

For enterprises, telcos, and CSPs, AI adoption sounds promising…until you start measuring impact. Most projects stall or even fail before ROI starts to appear. ML engineers lose momentum setting up clusters. Infrastructure teams battle to balance performance, cost, and compliance. Business leaders see budgets rise while value stays locked in prototypes.Gcore Everywhere AI changes that. It simplifies AI training, deployment, and scaling across on-premises, hybrid, and cloud environments, giving every team the speed, reliability, and control they need to turn AI initiatives into real outcomes.Why we built Everywhere AIEnterprises need AI that runs where it makes the most sense: on-premises for privacy, in the cloud for scale, or across both for hybrid agility. Not all enterprises are “AI-ready”, meaning that for many, the complexity of integrating AI offsets its benefits. We noticed that fragmented toolchains, complex provisioning, and compliance overhead can hinder the value of AI adoption.That’s why we built Everywhere AI: to simplify deployment, orchestration, and scaling for AI workloads across any environment, all controlled in one intuitive platform. We’re on a mission to bring every enterprise, CSP, and telco team a consistent, secure, and simple way to make AI efficient—everywhere.There are many tools on the market that promise to deliver similar benefits, but no other is able to simplify the deployment process to the point where it’s accessible to anyone in the business, regardless of their technical expertise. To use Everywhere AI, you don’t need to have a Ph.D. in Machine Learning or be a seasoned infrastructure engineer. Everywhere AI is for everyone at your organization.Enterprises today need AI that simply works, whether on-premises, in the cloud, or in hybrid deployments. With Everywhere AI, we’ve taken the complexity out of AI deployment, giving customers an easier, faster way to deploy high-performance AI with a streamlined user experience, stronger ROI, and simplified compliance across environments. This launch is a major step toward our goal at Gcore to make enterprise-grade AI accessible, reliable, and performant.Seva Vayner, Product Director of Edge Cloud and AI at GcoreFeatures and benefitsEverywhere AI brings together everything needed to train, deploy, and scale AI securely and efficiently:Deploy in just 3 clicks: Move from concept to training in minutes using JupyterLab or Slurm. Or simply select your tool, cluster size, and location, and let Everywhere AI handle your setup, orchestration, and scaling automatically.Unified control plane: Manage training, inference, and scaling from one dashboard, across on-prem, hybrid, and cloud. Operate in public or private clouds, or in fully air-gapped environments when data can’t leave your network.Gcore Smart Routing: Inference requests automatically reach the nearest compliant GPU region for ultra-low latency and simplified regulatory alignment. Built on Gcore’s global edge network (210+ PoPs), Smart Routing delivers uncompromising performance worldwide.Auto-scaling: Handle demand spikes seamlessly. Scale to zero when idle to reduce costs, or burst instantly for inference peaks.Privacy and sovereignty: Designed for regulated industries, Everywhere AI supports hard multitenancy for project isolation and sovereign privacy for sensitive workloads. Whether hybrid or fully disconnected, your models stay under your control.Proven resultsEnterprises deploying Everywhere AI can expect to see measurable, repeatable improvements:2× higher GPU utilization: Boost efficiency from ~40% to 80–95% with multi-tenancy and auto-scaling.80% lower infrastructure admin load: Infrastructure teams are more productive with automated software rollout and updates.From POC to results in one week: Enterprise teams take less than a week to onboard, test, and start seeing performance improvements from Everywhere AI.Early adopters are already validating Everywhere AI’s performance and flexibility.Gcore Everywhere AI and HPE GreenLake streamlines operations by removing manual provisioning, improving GPU utilization, and meeting application requirements, including fully air-gapped environments and ultra-low latency. By simplifying AI deployment and management, we’re helping enterprises deliver AI faster and create applications that deliver benefits regardless of scale: good for ML engineers, infrastructure teams, and business leaders.Vijay Patel, Global Director Service Providers and Co-Location Business, HPEPurpose-built for regulated industriesEverywhere AI is designed for organizations where privacy, uptime, and compliance are non-negotiable.Telcos: Use CDN-integrated Smart Routing to deliver real-time inference at carrier scale with consistent QoS.Finance firms: Deploy risk and fraud prevention models on-premises for data residency while scaling, benefiting from auto-scaling and multi-tenancy for maximum efficiency.Healthcare providers: Run imaging and diagnostics AI inside hospital networks to protect PHI.Public-sector agencies: Deliver robust AI-driven citizen services securely under strict compliance regimes.Industrial enterprises: Leverage model and GPU health checks on edge deployments to keep critical predictive maintenance models running in remote sites.Run AI on your termsWhether you’re training large models on-premises, scaling inference at the edge, or operating across multiple regions, Gcore Everywhere AI gives you full control over performance, cost, and compliance.Ready to deploy AI everywhere you need it? Discover how Everywhere AI can simplify and accelerate your AI operations.Learn more about Everywhere AI

November 3, 2025 3 min read

How we engineered a single pipeline for LL-HLS and LL-DASH

Viewers in sports, gaming, and interactive events expect real-time, low-latency streaming experiences. To deliver this, the industry has rallied around two powerful protocols: Low-Latency HLS (LL-HLS) and Low-Latency DASH (LL-DASH).While they share a goal, their methods are fundamentally different. LL-HLS delivers video in a sequence of tiny, discrete files. LL-DASH delivers it as a continuous, chunked download of a larger file. This isn't just a minor difference in syntax; it implies completely different behaviors for the packager, the CDN, and the player.This duality presents a major architectural challenge: How do you build a single, efficient, and cost-effective pipeline that can serve both protocols simultaneously from one source?At Gcore, we took on this unification problem. The result is a robust, single-source pipeline that delivers streams with a glass-to-glass latency of approximately 2.0 seconds for LL-DASH and 3.0 seconds for LL-HLS. This is the story of how we designed it.Understanding the dualityTo build a unified system, we first had to deeply understand the differences in how each protocol operates at the delivery level.LL-DASH: the continuous feedMPEG-DASH has always been flexible, using a single manifest file to define media segments by their timing. Low-Latency DASH builds on this by using Chunked CMAF segments.Imagine a file that is still being written to on the server. Instead of waiting for the whole file to be finished, the player can request it, and the server can send it piece by piece using Chunked Transfer Encoding. The player receives a continuous stream of bytes and can start playback as soon as it has enough data.Single, long-lived files: A segment might be 2–6 seconds long, but it’s delivered as it’s being generated.Timing-based requests: The player knows when a segment should be available and requests it. The server uses chunked transfer to send what it has so far.Player-driven latency: The manifest contains a targetLatency attribute, giving the player a strong hint about how close to the live edge it should play.LL-HLS: the rapid-fire deliveryLL-HLS takes a different approach. It extends the traditional playlist-based HLS by breaking segments into even smaller chunks called Parts.Think of it like getting breaking news updates. The server pre-announces upcoming Parts in the manifest before they are fully available. The player then requests a Part, but the server holds that request open until the Part is ready to be delivered at full speed. This is called a Blocking Playlist Reload.Many tiny files (Parts): A 2-second segment might be broken into four 0.5-second Parts, each requested individually.Manifest-driven updates: The server constantly updates the manifest with new Parts, and uses special tags like #EXT-X-PART-INF and #EXT-X-SERVER-CONTROL to manage delivery.Server-enforced timing: The server controls when the player receives data by holding onto requests, which helps synchronize all viewers.A simplified diagram visually comparing the LL-HLS delivery of many small parts versus the LL-DASH chunked transfer of a single, larger segment over the same time period.These two philosophies demand different things from a CDN. LL-DASH requires the CDN to intelligently cache and serve partially complete files. LL-HLS requires the CDN to handle a massive volume of short, bursty requests and hold connections open for manifest updates. A traditional CDN is optimized for neither.Forging a unified strategyWith two different delivery models, where do you start? You find the one thing they both depend on: the keyframe.Playback can only start from a keyframe (or I-frame). Therefore, the placement of keyframes, which defines the Group of Pictures (GOP), is the foundational layer that both protocols must respect. By enforcing a consistent keyframe interval on the source stream, we could create a predictable media timeline. This timeline can then be described in two different “languages” in the manifests for LL-HLS and LL-DASH.A single timeline with consistent GOPs being packaged for both protocols.This realization led us to a baseline configuration, but each parameter involved a critical engineering trade-off:GOP: 1 second. We chose a frequent, 1-second GOP. The primary benefit is extremely fast stream acquisition; a player never has to wait more than a second for a keyframe to begin playback. The trade-off is a higher bitrate. A 1-second GOP can increase bitrate by 10–15% compared to a more standard 2-second GOP because you're storing more full-frame data. For real-time, interactive use cases, we prioritized startup speed over bitrate savings.Segment Size: 2 seconds. A 2-second segment duration provides a sweet spot. For LL-DASH and modern HLS players, it's short enough to keep manifest sizes manageable. For older, standard HLS clients, it prevents them from falling too far behind the live edge, keeping latency reduced even on legacy players.Part Size: 0.5 seconds. For LL-HLS, this means we deliver four updates per segment. This frequency is aggressive enough to achieve sub-3-second latency while being coarse enough to avoid overwhelming networks with excessive request overhead, which can happen with part durations in the 100–200ms range.Cascading challenges through the pipeline1. Ingest: predictability is paramountTo produce a clean, synchronized output, you need a clean, predictable input. We found that the encoder settings of the source stream are critical. An unstable source with a variable bitrate or erratic keyframe placement will wreck any attempt at low-latency delivery.For our users, we recommend settings that prioritize speed and predictability over compression efficiency:Rate control: Constant Bitrate (CBR)Keyframe interval: A fixed interval (e.g., every 30 frames for 30 FPS, to match our 1s GOP).Encoder tune: zerolatencyAdvanced options: Disable B-frames (bframes=0) and scene-cut detection (scenecut=0) to ensure keyframes are placed exactly where you command them to be.Here is an example ffmpeg command in Bash that encapsulates these principles:ffmpeg -re -i "source.mp4" -c:a aac -c:v libx264 \ -profile:v baseline -tune zerolatency -preset veryfast \ -x264opts "bframes=0:scenecut=0:keyint=30" \ -f flv "rtmp://your-ingest-url"2. Transcoding and packagingOur transcoding and Just-In-Time Packaging (JITP) layer is where the unification truly happens. This component does more than just convert codecs; it has to operate on a stream that is fundamentally incomplete.The primary challenge is that the packager must generate manifests and parts from media files that are still being written by the transcoder. This requires a tightly-coupled architecture where the packager can safely read from the transcoder's buffer.To handle the unpredictable nature of live sources, especially user-generated content via WebRTC, we use a hybrid workflow:GPU Workers (Nvidia/Intel): These handle the heavy lifting of decoding and encoding. Offloading to GPU hardware is crucial for minimizing processing latency and preserving advanced color formats like HDR+.Software Workers and Filters: These provide flexibility. When a live stream from a mobile device suddenly changes resolution or its framerate drops due to a poor connection, a rigid hardware pipeline would crash. Our software layer can handle these context changes gracefully, for instance, by scaling the erratic source and overlaying it on a stable, black-background canvas, meaning the output stream never stops.This makes our JITP a universal packager, creating three synchronized content types from a single, resilient source:LL-DASH (CMAF)LL-HLS (CMAF)Standard HLS (MPEG-TS) for backward compatibility3. CDN delivery: solving two problems at onceThis was the most intensive part of the engineering effort. Our CDN had to be taught how to excel at two completely different, high-performance tasks simultaneously.For LL-DASH, we developed a custom caching module we call chunked-proxy. When the first request for a new .m4s segment arrives, our edge server requests it from the origin. As bytes flow in from the origin, the chunked-proxy immediately forwards them to the client. When a second client requests the same file, our edge server serves all the bytes it has already cached and then appends the new bytes to both clients' streams simultaneously. It’s a multi-client cache for in-flight data.For LL-HLS, the challenges were different:Handling Blocked Requests: Our edge servers needed to be optimized to hold thousands of manifest requests open for hundreds of milliseconds without consuming excessive resources.Intelligent Caching: We needed to correctly handle cache statuses (MISS, EXPIRED) for manifests to ensure only one request goes to the origin per update, preventing a "thundering herd" problem.High Request Volume: LL-HLS generates a storm of requests for tiny part-files. Our infrastructure was scaled and optimized to serve these small files with minimal overhead.The payoff: ultimate flexibility for developersThis engineering effort wasn't just an academic exercise. It provides tangible benefits to developers building with our platform. The primary benefit is simplicity through unification, but the most powerful benefit is the ability to optimize for every platform.Consider the complex landscape of Apple devices. With our unified pipeline, you can create a player logic that does this:On iOS 17.1+: Use LL-DASH with the new Managed Media Source (MMS) API for ~2.0 second latency.On iOS 14.0 - 17.0: Use native LL-HLS for ~3.0 second latency.On older iOS versions: Automatically fall back to standard HLS with a reduced latency of ~9 seconds.This lets you provide the best possible experience on every device, all from a single backend and a single live source, without any extra configuration.Don't fly blind: observability in a low-latency worldA complex system is useless without visibility, and traditional metrics can be misleading for low-latency streaming. Simply looking at response_time from a CDN log is not enough.We had to rethink what to measure. For example:For an LL-HLS manifest, a high response_time (e.g., 500ms) is expected behavior, as it reflects the server correctly holding the request while waiting for the next part. A low response_time could actually indicate a problem. We monitor “Manifest Hold Time” to ensure this blocking mechanism is working as intended.For LL-DASH, a player requesting a chunk that isn't ready yet might receive a 404 Not Found error. While occasional 404s are normal, a spike can indicate origin-to-edge latency issues. This metric, combined with monitoring player liveCatchup behavior, gives a true picture of stream health.Gcore: one pipeline to serve them allThe paths of LL-HLS and LL-DASH may be different, but their destination is the same: real-time interaction with a global audience. By starting with a common foundation—the keyframe—and custom-engineering every component of our pipeline to handle this duality, we successfully solved the unification problem.The result is a single, robust system that gives developers the power of both protocols without the complexity of running two separate infrastructures. It’s how we deliver ±2.0s latency with LL-DASH and ±3.0s with LL-HLS, and it’s the foundation upon which we’ll build to push the boundaries of real-time streaming even further.

October 20, 2025 6 min read

Related articles