We are excited to announce a new Gcore development: our JIT (Just-In-Time) Packager. This solution facilitates simultaneous streaming across six protocols: HLS, DASH, L-HLS, Chunked CMAF/DASH, Apple Low Latency HLS, and HESP. In this article, we’ll explain why HLS and DASH streaming make low latency a challenge, dive into alternative technologies with exciting latency reduction potential, and then tell you about our JIT Packager—why we developed it, how it works, its capabilities, benefits, and results.
The difficulty in achieving low latency with standard HLS and DASH technologies stems from their recommended segment length and buffer size guidelines, which can result in latency of twenty seconds or more. Let’s explore why this is the case.
In conventional internet streaming, technologies such as HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) are commonly employed. These protocols are based on HTTP and divide video and audio content into small segments spanning a few seconds. This segmentation facilitates fast navigation, bitrate switching, and caching. The client receives a textual document containing the sequence of segments, their addresses, and additional metadata such as resolution, codecs, bitrate, duration, and language. However, adhering to the recommended segment length and buffer size guidelines of HLS and DASH protocols makes achieving low latency challenging. The latency can be upwards of twenty seconds.
Here’s an example: Let’s say we initiate transcoding and create segments with a duration of 6 seconds. Next, we start playing this stream. This means that the player first needs to fill its buffer by loading three segments. By the time we start, we observe that the three fully formed and closest-to-real-time segments are segments 3, 4, and 5. Consequently, playback will begin with the 3rd segment, and it’s easy to calculate the delay based on the segment duration, which will be a minimum of 18 seconds.
Using shorter segments, such as 1-2 seconds, we can reduce the delay. Segments with a duration of 2 seconds would result in a minimum delay of 6 seconds. However, this would require reducing the GOP (group of pictures) size. Reducing SOP size reduces encoding efficiency and leads to increased traffic overhead, because each segment contains not only video and audio but also additional overhead. Additionally, there is overhead from the HTTP protocol with each segment request.
This means that shorter segments lead to a larger number of segments and, consequently, higher overhead. With a large number of viewers constantly requesting segments, this would result in significant traffic consumption.
To achieve lower latency in streaming, several specialized solutions can be utilized:
These solutions differ from traditional HLS and DASH protocols in that they are specifically tailored for low-latency streaming.
Now, let’s dive into these protocols in more detail.
In the case of L-HLS, the client receives new fragments of the last segment as they become available. This is achieved by declaring the address of this segment in the playlist using a special PREFETCH tag. By so doing, it is possible to significantly reduce latency, and the data path is shortened according to the following steps:
When it comes to chunked CMAF/DASH, the standard includes fields that control the timeline, update frequency, delay, and distance to the edge of the playlist. The key enhancements in the Dash.js v2.6.8 reference player version are the support for chunked transfer encoding and Fetch API wherever possible, as well as delivering data to the player as soon as it becomes available.
The indication of a low-latency stream is achieved through use of the Latency target and availabilityTimeOffset tags, which signal the target delay and allow for fragment loading to begin before the full segment formation is completed.
By utilizing these technologies, it is possible to achieve delays in the range of 2-6 seconds, depending on the configuration and settings of both the server-side and player-side components. Furthermore, there is backward compatibility, allowing devices that do not understand low-latency formats to playback full segments as before.
Apple LL-HLS offers several latency optimization solutions, including:
Only Apple LL-HLS works natively on Apple devices, making its implementation necessary for low-latency streaming on these devices.
HESP (High Efficiency Stream Protocol) is an adaptive video streaming protocol based on HTTP designed to deliver ultra-low latency streaming. It is capable of delivering video with a delay of up to 2 seconds. Unlike previous solutions, HESP requires 10-20% less bandwidth for streaming by allowing the use of longer GOP (group of pictures) durations.
Using chunked transfer encoding, the player first receives a JSON manifest containing stream information and timing. The streaming process occurs in two streams: the Initialization Stream and the Continuation Stream.
From the Initialization Stream, the player can request images at any given time to initiate playback, as it only contains I frames (keyframes.) Once playback starts, the Continuation Stream is used, and the player can begin playback after receiving any image from the Initialization Stream.
This enables fast and uninterrupted video transmission and playback in the user’s player, as well as seamless quality switching. The illustration demonstrates an example where one video quality is initially played and then switched to another, with the Initialization Stream requested once.
To implement all these protocols, we decided to create our own solution. There are several reasons behind this decision:
When considering the specific technologies, not all third-party solutions support Apple LL-HLS and HESP. For instance, the Apple Media Stream Segmenter is limited to MPEG-2 TS over UDP and only functions on MacOS, while it uploads files to the file system. The HESP packager + HTTP origin, on the other hand, transmits files via Redis and is written in TypeScript.
It’s important to note that relying on these external solutions consumes resources, introduces delays and dependencies, and can impact parallelism and scalability. Moreover, managing a diverse array of solutions can complicate maintenance and support.
The operation of our JIT Packager can be outlined as follows:
On average, we achieved an approximate caching rate of 80%.
Let’s take a look at what we have accomplished with our JIT Packager.
We have successfully developed a unique JIT Packager capable of simultaneously streaming video in HLS, DASH, and all currently available low-latency streaming formats. Primarily, it accepts video and audio streams in fragmented MP4 format from the transcoder. The server directly extracts all necessary media data from the MP4 files and dynamically generates initialization segments, corresponding playlists, and video fragments for streaming in all mentioned streaming modes with minimal delays. Subsequently, the streams become available for distribution via a CDN.
Our solution operates within an internal network using HTTP/1.1 without TLS. TLS offers no benefits in this context and would only introduce unnecessary overhead, requiring us to encrypt the entire stream once again. Instead, data is transmitted using chunked transfer encoding.
As a result, we have not only developed a Packager but also an HTTP server capable of delivering video in all the previously mentioned formats. Moreover, the same video and audio streams are utilized for each format, ensuring efficient resource utilization.
We have implemented DVR functionality to allow users who have missed a live broadcast to rewind and catch up. All microsegments are stored in a separate cache on the server’s memory. Subsequently, they are merged and cached on disk as complete video fragments. These complete segments are then served during backward playback. DVR segments are automatically deleted after a certain period of time has elapsed.
When it comes to protocols utilizing chunked transfer encoding, it is important to note that not all CDNs support caching files before they are fully downloaded from the origin server. While nginx, acting as a proxy server, is capable of handling sources with chunked transfer and proxying their responses chunk by chunk, subsequent requests are bypassed and sent directly to the source until the entire response is completed. The cache is only utilized when the complete response is available. However, this approach proves ineffective for efficient scaling of low-latency video streaming, particularly when a significant number of viewers are likely to access the last segment simultaneously.
To address this challenge, we have implemented a separate caching service for chunked-proxy requests on each CDN node. Its key feature lies in the ability to cache partial HTTP responses. This means that while the first client initiating the request to the source receives its response, any number of clients desiring the same response will be served by our server with minimal overall delay. The already-received portions will be immediately delivered, while the rest will be provided as they arrive from the source. This caching service stores the passing requests in the server’s memory, allowing us to reduce latency compared to storing fragments on disk.
Memory usage limits are also taken into account. If the total cache size reaches the limit, elements are evicted based on the least recently accessed order. Furthermore, we have developed a specialized API that enables CDN edge nodes to proactively determine the content’s location in advance.
The development of our JIT Packager has allowed us to achieve our goals in low-latency streaming. We can stream through multiple advanced protocols simultaneously without relying on third-party vendors, significantly improving the user experience. We can promptly respond to incidents and adapt the solution to meet client needs more efficiently.
But we’re not stopping there. Our plans include further reducing latency while maintaining quality and playback stability. We are also working on optimizing the system as a whole, adding more metrics for monitoring and control, and continuing pushing the boundaries of innovation in the field.
We are excited about the possibilities ahead and remain dedicated to delivering our users high-quality, low-latency streaming experiences.