This article originally appeared on The New Stack, where Gcore developers are regular expert contributors.
At Gcore, weâre continuously enhancing our packet processing core to strengthen Gcore DDoS Protection. This involves integrating key features like the regular expression engine and adapting to the dynamic requirements of online traffic. Our customers frequently update their security policies, and we consider it crucial to adapt our protection suite to those changes as part of our commitment to evolving and improving cybersecurity. In this article, weâll explain the techniques we use to manage eBPF/XDP effectively and discuss the importance of flexibility and adaptability in DDoS protection to accommodate our customersâ changing security policy settings.
How Gcore Innovates With eBPF for Enhanced DDoS Protection Configuration
The development team at Gcore has faced the unusual challenge of creating systems to serve a broad customer base, setting us apart from the usual practice of developing for internal use. Recognizing the need for rapid and frequent updates to Gcore DDoS Protection, we have moved beyond the standard one or two daily updates for self-hosted solutions to the almost constant updates required by service providers. This need, often overlooked in Linux applications, prompted us to embrace eBPF technology, enabling quick, uninterrupted updates.
Our progress towards this solution was deliberate, involving a thorough exploration of various approaches to ensure optimal management of our eBPF configurations. We share our insights and strategies, encouraging careful planning and execution of eBPF programs for peak efficiency. Weâll explore these strategies and their benefits in the following sections of this article, providing insights into maximizing eBPFâs full potential.
Configuration Management of XDP
eBPF maps serve as a sophisticated interface for the atomic update of shared memory segments, functioning as shared memory and providing a robust configuration interface for eBPF programs. The Read-Copy-Update mechanism minimizes performance footprint in the hot path. Additionally, eBPF maps allow exclusive access to shared memory fragments. A combination of map types, including arrays, hash tables, bloom filters, queues, and ring buffers, accommodates any complex configuration.
As configuration complexity grows, so does the need for more connections between different mapsâ entries. Eventually, if the number of connections between map entries becomes excessive, the ability to perform atomic configuration updates diminishes. Furthermore, updating a single map entry might necessitate simultaneous updates in other maps, risking inconsistency during the update period.
Consider a simple XDP (eXpress Data Path) program that classifies and filters traffic based on a prioritized 5-tuple ruleset. The program processes the next packet based on a combination of the ruleâs priority, and the packetâs source IP address, destination IP address, protocol, and source and destination port.
Here are examples of rules for a network configuration:
- Always allow ANY traffic from subnet A.
- Restrict access to web servers in subnet B for clients from subnet C.
- Restrict access to web servers in subnet B.
- Deny all other access.
These rules require storing both traffic classification rules and restrictions in the configuration, which can be achieved by using eBPF maps.
Understanding Configuration in eBPF Programs as a Tree Structure
We can visualize configurations as a hierarchical tree, with a âconfiguration rootâ at its base serving as the foundation. This root, which may be virtual, primarily organizes various configuration entities to form the active configuration. Entities either connect directly to the root for immediate global access or nest within other entities for structured organization. Accessing a specific entity begins at the root, progressing sequentially or “dereferencing” level by level to the desired entity. For example, to retrieve a Boolean flag from an âoptionsâ structure within a collection, one navigates to the collection, locates the structure, and then retrieves the flag.
This tree-like structure offers flexibility in configuration management, including atomic swaps of any subtree, ensuring smooth transitions without disruption. However, increased complexity brings challenges; as configurations become more intricate, the interconnections among entries intensify. Itâs common for several parent entries to point to a single child entry, and for an entry to play dual roles, acting as a property of one entity while also being part of a collection.
Modern programming languages have developed mechanisms to manage complex configurations safely. Developers use reference counters, mutable and immutable references, and garbage collectors to ensure safe updates. However, itâs critical to understand that safety in managing these configurations doesnât guarantee atomicity when switching between configuration versions.
The limitations of eBPF maps have led our team at Gcore to rethink our configuration storage strategies. The inability of eBPF map entries to store direct pointers to arbitrary memory segments, due to kernel safety verifications, requires us to use search keys for map entry access, slowing down the lookup process. However, this drawback offers a benefit: it allows for dividing complex configuration trees into smaller, more manageable segments, linked directly to the configuration root, ensuring consistency even during non-atomic updates.
Next, weâll look into specific configuration update strategies employed in eBPF environments, highlighting their suitability for the systemâs unique requirements and limitations.
Strategies for Safe Configuration Updates
Optimizing configuration management is essential in XDP/eBPF programming. This section outlines strategies to enhance program updates, ensuring high performance and flexibility. We discuss incremental updates and map/program replacement techniques to improve configuration management, enabling developers to manage updates in XDP/eBPF programs effectively while maintaining system integrity and performance.
Update Strategy #1: A Step-by-Step Transition
This strategy allows incremental configuration updates across several maps, useful when processing data in one map provides a lookup key for another. In such cases, where multiple map entries need to be updated, atomic transitions are not feasible. However, through precise and sequential update operations, itâs possible to update the configuration methodically, keeping it valid at each step.
With this approach, some operations on referenced configuration subtrees become safe if executed in the correct order. For example, in the context of classification and processing, the classification layer provides a lookup key for a matching security policy, suggesting that update operations should follow a specific sequence:
- Inserting a new security policy is safe since new policies are not yet referenced.
- Updating an existing security policy is also safe, as updating them individually generally presents no issues. Although an atomic update would be desirable, it does not offer significant advantages.
- Updating classification layer maps to reference new security policies and remove references to obsolete ones is safe.
- Purging unused security policies from the configuration is safe once they are no longer referenced.
Even without atomic updates, it is possible to perform a safe update by correctly ordering the update procedure. This approach works best for independent maps that are not closely linked with other maps. Incremental updates, as opposed to updating the entire map at once, are recommended. For instance, incremental updates to hashmaps and arrays are perfectly safe. However, that is not the case with incremental updates to LPM maps, because the lookup depends on the elements already present in the map. This also arises when creating the lookup key for another table requires manipulating elements from multiple maps. The classification layer, often implemented using several LPM and hash tables, is a perfect example of this.
Update Strategy #2: Map Replace
For maps that cannot be updated incrementally without inconsistencies, such as LPM maps, replacing the entire map is the solution. To replace a map for an eBPF program, a map of maps must be used. A user-space application can create a new map, populate it with the necessary entries, and then atomically replace the old one.
Dividing the configuration into separate maps, each describing the settings for a single entity, offers an added benefit of resource isolation and avoids the need to recreate a full configuration during minor updates. The configuration for each of the multiple entities can be stored in a replaceable map.
Although this approach has advantages, it also has drawbacks. The userspace needs to unpin the previous map to maintain the previous pin path, since the replacement map cannot be pinned to the same location as the previous one. This is particularly important to consider for long-lived programs that frequently update configurations and rely on map pinning for stability.
Update Strategy #3: Program Replace
When linking multiple maps together, the map replace method may fail to work. Updating them individually can result in an inconsistent or invalid stateâneither reflecting the old nor the new intended configuration. This can be remedied once all map updates are completed.
To address this issue, atomic updates should take place at a higher level. Although eBPF lacks a mechanism to replace a set of maps atomically, maps are usually linked to a specific eBPF program. Dividing the interconnected maps and corresponding code into separate eBPF programs, linked by tail calls, can address this.
Implementing this requires loading a new eBPF program, creating and filling maps for it, pinning both, and then updating the ProgMap from user space. This process is more labor-intensive than a simple map replacement but allows for simultaneous updates of maps and associated code, facilitating runtime code adjustments. However, this strategy may not always be efficient, especially when updating a single map entry in a complex program with multiple maps and sub-programs.
What You Should Know about Error Handling
This guide emphasizes the importance of updating configurations to prevent inconsistencies, while highlighting the complexities involved in error handling. When errors occur during an update, they can lead to ambiguous configurations, making automated recovery mechanisms essential to minimize manual corrections. Organizing errors into categories of recoverable and unrecoverable, with explicit recovery protocols for each, allows for efficient error management and ensures issues are resolved promptly and clearly:
- Recoverable: If a recoverable error occurs during an update, the entire process is halted without committing any changes to the configuration. Recovery can be initiated without risk.
- Unrecoverable: These require cautious recovery strategies as they impact specific configuration entities, aiming to prevent broader system disruption.
Organizing updates by configuration entity rather than update type is crucial. This approach ensures that errors affect only the targeted configuration entity, rather than all of them simultaneously. For instance, in our example with classification and processing, where different network segments have defined classification rules and security policies, itâs more effective to update them in separate cycles based on network segments rather than by update type. That simplifies the implementation of automatic recovery procedures, and provides clarity on which segment was impacted if an unrecoverable error occurs. Only one network segment will have an inconsistent configuration, while others will remain unaffected or can be easily switched to a new configuration.
Managing eBPF Program Lifecycles for Updates
The lifecycle management of an eBPF program is crucial, especially for programs requiring persistence, frequent updates, and state retention across different code instances. For example, if an XDP program requires frequent code updates while maintaining existing client sessions, it is essential to manage its lifetime effectively.
Developers focusing on maximizing flexibility while minimizing constraints should aim to retain only indispensable information between reloads, information that cannot be sourced from non-volatile storage. This approach allows for dynamic configuration adjustments within eBPF maps.
Simplifying the hot code reload process involves distinguishing state maps from configuration maps, reusing state maps during reloads, and repopulating configuration maps from non-volatile storage. Transitioning processing from an old to a new program and informing all eBPF map users about the change poses a significant challenge.
Two main approaches are commonly used:
- Atomic Program Replacement. Directly attaching the XDP program to a network interface and atomically swapping it during updates. This approach may be less suitable for large, complex eBPF programs that interact with multiple user-space programs and maps.
- libxdp-like Approach. A dispatcher program, once linked to the network interface, uses tail-calls for processing in the next program from the ProgMap for actual processing. The dispatcher program, besides managing map usage and pinning, coordinates multiple processing programs, enabling quick transitions between them.
The hot reload process ensures prompt detection and correction of configuration issues, quickly reverting to a previous stable version when necessary. For advanced scenarios like A/B testing, the dispatcher can augment itself with a classification table to direct specific traffic flows to a new version of an XDP program.
Conclusion
Weâre constantly pushing the boundaries of network security and performance optimization to combat emerging threatsâincluding by using advanced eBPF/XDP features. As we continue to evolve our packet processing core, we remain committed to delivering cutting-edge solutions that ensure our customersâ networks are both robust and agile.
For proven DDoS mitigation, try Gcore DDoS Protection. Our all-in-one service provides real-time DDoS protection for websites, apps, and servers, with a 99.99% SLA for your peace of mind.