How we use regex in XDP for packet filtering: Our open-source solution and benchmarks

How we use regex in XDP for packet filtering: Our open-source solution and benchmarks

In our DDoS Protection, we use the bundle of XDP and regular expressions (regex). In this article, we’ll explain why we started using this solution (regex in XDP) and how we bound them via a third-party engine and API development. Moreover, we’ll share our open-source solution for handling regex in XDP and benchmarking results with you.

Why we started using the XDP framework

When we first started to provide DDoS Protection as a service, we used a few dedicated servers (nodes) with DPDK (a Linux framework for fast packet processing) and filtered traffic using regular expressions. This bundle works effectively, and we have successfully protected our clients’ applications. However, over time, the infrastructure became insufficient to cope with the increasing capacity of DDoS attacks: 300 Gbps in 2021 compared to 700 Gbps in 2022.

DPDK has a fundamental limitation; namely, it requires exclusive access to network adapters. So, it’s almost impossible (equally tough and irrational) to combine it with other applications. This means that to expand the infrastructure without it changing, you need to purchase separate dedicated nodes for DPDK. It seemed to be economically inefficient, and we decided to find another solution.

As you may know, in addition to DDoS Protection, we provide CDN infrastructure with almost a thousand CDN nodes. So, it’ll be logical to start using them for content delivery and traffic filtering. Because the more nodes, the more effectively our protection will cope with the growing capacity of attacks.

We can’t use DPDK with CDN nodes (it requires dedicated ones), so we decided to switch to the XDP framework. Its main difference from the previous one is that it easily incorporates into the stack with other applications. Previously, we needed dedicated servers with DPDK (Figure 1), but now, we can integrate DDoS Protection into CDN servers (Figure 2) and scale infrastructure.

DDoS Protection with dedicated servers (DPDK)
Figure 1. DDoS Protection with dedicated servers (DPDK)
DDoS Protection integrated into CDN servers (XDP)
Figure 2. DDoS Protection integrated into CDN servers (XDP)

XDP is convenient for us for the following reasons:

  • Cost efficiency. The framework can be installed on servers with any edge-network applications, such as web servers and DNS servers, so it doesn’t require expensive dedicated equipment, and developers don’t need to spend hours integrating other applications with XDP.
  • Attack detection speed. The framework can be installed on hundreds of CDN servers. That means that DDoS Protection is closer to client applications and malicious traffic sources. As a result, attacks are stopped faster and don’t go deep into the infrastructure.

But it also has several drawbacks:

  • Lower performance. Separated nodes with XDP have lower performance compared to DPDK, but because of the incorporation of more nodes, the overall efficiency of the solution is ultimately higher.
  • Inability to handle regex. XDP has no built-in engine to handle regular expressions, so we had to come up with a solution to adapt regular expression processing to it.

Why we’ve kept using regex to filter traffic

There are two approaches to filtering out malicious traffic in DDoS Protection: packet parsers and handling regular expressions (regex).

Packet parsers are manually written filters that are programmed to detect and block suspicious activity in applications using a particular protocol. Writing such packet parsers requires a lot of programming work, especially if the DDoS protection needs to be able to accept new protocols quickly.

Handling regex based on analyses of packet payloads lessens the time it takes to create filters versus the packet parsers approach. Moreover, it’s a more flexible approach that allows packets to be processed more efficiently with a lower kernel load.

Packets that are sent to our customers’ applications are checked in two modes:

  • Reaction to attacks (manual mode). We analyze malicious traffic generated by a specific payload (pattern). We then create regular expressions that point to this payload and apply it to traffic. All requests that contain a similar payload will be automatically filtered.
  • Game connection protection mode. Many Gcore clients are online gaming services providers characterized by requests over the UDP protocol and small packet sizes. Packages coming to gaming services have a strict structure that can be described using regular expressions. We create regular expressions for each client’s game service and use them to create an allowlist of packets. All packets that match regular expressions will be allowed. If the packets differ, they’ll be blocked.

Packet processing when using regex is arranged as follows:

  • Dissector. Analyzes each packet and splits it into headers.
  • Flow router. Routes packets to the appropriate protection profile, which is a typical set of rules for traffic protection.
  • Policy Pipeline. Applies special rules (countermeasures) to packets disassembled into components. One of the countermeasures is the use of regular expressions.
  • Verdict. Skips or blocks a packet based on countermeasure checks.

How we adapt regex processing in the XDP context: Challenges and solutions

Working with regex is a resource-intensive process, and we were going to check millions of packets and use regular expressions of different complexity. That’s why performance is one of the key requirements for a regex engine.

The best engine available is Hyperscan, designed by Intel. It’s open-source with a license compatible with GPL, which is fast because it uses an AVX2/AVX512 vector instruction set, and is used as an industry standard for DPI applications.

In adapting the processing of regex in XDP, we encountered several challenges, which we describe below.

Challenge 1. Limitations of eBPF don’t allow regex filters to be implemented as part of the XDP program.

Solution. We rebuilt the Hyperscan engine as a loadable Linux kernel module providing eBPF helpers. Hyperscan is an engine designed to process regular expressions in DPI (Deep Packet Inspection) systems, checking if a packet’s payload matches any predefined regular expressions.

Challenge 2. eBPF helpers from loadable modules can’t be registered for XDP.

Solution. eBPF helpers in loadable modules were first introduced in Linux 5.16, but registering them for XDP wasn’t possible until Linux 5.18. Because we had Linux 5.17 during development, we had to provide that possibility. Mainline kernel builds don’t require that kind of patching.

Challenge 3. Vector instructions (FPU) were not supposed to be used during packet processing in the Linux kernel.

Solution. We save and restore the FPU registers state as we enter the module for regex processing. We perform this per packet when required without affecting other packets for which regex processing isn’t required.

What open-source solution do we offer to the community: eBPF API for handling regex in XDP

If your infrastructure needs to handle regular expressions in XDP, you can use a ready-made solution provided by our developers instead of going all the way from scratch.

Our custom eBPF helper ‘bpf_xdp_scan_bytes()’ can now be used in the same way as other eBPF helpers.

struct rex_scan_attr attr = {
     .database_id = regex_id,
     .handler_flags = REX_SINGLE_SHOT,
     .nr_events = 0,
     .last_event = {},
 err = bpf_xdp_scan_bytes(xdp, payload_off, payload_len, &attr);
 if (err < 0)
 return (attr.nr_events > 0) ? XDP_DROP : XDP_PASS

To evaluate regex against the packet buffer, add a regex into the loadable module first and reference its identifier when calling the eBPF helper:

  1. Create a node using mkdir under /sys/kernel/config/rex.
  2. Compile pattern database:
echo '101:/foobar/' > patterns.txt
echo '201:/a{3.10}/' > patterns.txt
build/bin/hscollider -e patterns.txt -ao out/ -nl
  1. Upload compiled regex to the /sys/kernel/config/rex//database:
dd if=$(echo out/".db) of=/sys/kernel/config/rex/hello/database
  1. Read or set a new regex identifier at /sys/kernel/config/rex//id.
  2. Transfer regex identifier to eBPF program and use as a helper argument.

The full source code is available in the Gcore GitHub account by link: https://github.comGcorelinux-regex-module.

What benchmarks do we have on regex usage in XDP

Our DDoS filtering solution is based on 3rd Generation Intel® Xeon® Scalable processors and 100GbE Intel® Ethernet Network Adapter E810. Intel® Hyperscan enabled high-performance pattern matching across data streams.

Intel® provided expert insight, including on the XDP technology that was used for packet filtering. Using the new software on the latest 3rd Generation Intel® Xeon® Scalable processors has increased the filtering capacity from 100 Gbps to up 400 Gbps or 200 million packets per second.

In the charts below, you can see what the results of the testing are.

  • The linerate (blue line) is the maximum network throughput of 4 × 100 Gbps interfaces.
  • The base (red line) is a measure of how XDP would have handled it without using regular expressions.
  • The bottom three lines are packet handling when using regular expressions.

Conclusion. On a packet size bigger than 512 bytes, the system can operate on the line rate speed, effectively filtering traffic. On packets smaller than 512 bytes, the packet rate and the pressure for the system are much higher, and the system can’t maintain linerate performance and works roughly at 40–50% of the line rate speed.

We are satisfied with the results. They suit us even though the performance for processing small packets isn’t as high. Because real DDoS attacks tend to use large packets, the efficiency and speed are sufficiently acceptable.

Tests show that regex usage in XDP is suitable for processing heavy processes, yet it has enough speed to handle a large amount of traffic.

Subscribe to our newsletter

Stay informed about the latest updates, news, and insights.