Re: [RFC PATCH] bpf: introduce new bpf map type BPF_MAP_TYPE_WILDCARD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[sorry for the noise, got a bounce from the list, resend with the
message trimmed]

On 09/07, Anton Protopopov wrote:
Add a new map, BPF_MAP_TYPE_WILDCARD, which provides means to implement generic online packet classification. Here "online" stands for "fast lookups and fast updates", and "generic" means that a user can create maps with custom lookup schemes—different numbers of fields and different interpretation of individual
fields (prefix bitmasks, ranges, and direct matches).

In particular, in Cilium we have several use cases for such a map:

   * XDP Prefilter is a simple XDP-based DDoS mitigation system provided by
Cilium. At the moment it only supports filtering by source CIDRs. It would benefit from using this new map, as it allows to utilize wildcard rules
     without big penalty comparing to one hash or LPM lookup utilized now.

* XDP Packet Recorder (see https://lpc.events/event/11/contributions/953/)

* K8S and Cilium Network Policies: as of k8s 1.25 port ranges are considered to be a stable feature, and this map allows to implement this easily (and also to provide more sophisticated filtering for Cilium Network Policies)

Keys for wildcard maps are defined using the struct wildcard_key structure. Besides the type field, it contains the data field of an arbitrary size. To
educate map about what's contained inside the data field, two additional
structures are used. The first one, struct wildcard_desc, also of arbitrary
size, tells how many fields are contained inside data, and the struct
wildcard_rule_desc structure defines how individual fields look like.

Fields (rules) can be of three types: BPF_WILDCARD_RULE_{PREFIX,RANGE,MATCH}.

[..]

The PREFIX rule means that inside data we have a binary value and a binary
(prefix) mask:

                size             u32
         <----------------> <----------->
    ... |    rule value    |   prefix   | ...

Here rule value is a binary value, e.g., 123.324.128.0, and prefix is a u32 bit variable; we only use lower 8 bits of it. We allow 8, 16, 32, 64, and 128 bit
values for PREFIX rules.

I haven't looked at the code, so pardon stupid questions. This sounds
like optimized LPM-trie?

If not, what's the difference?

If yes, can this be special cased/optimized in the existing LPM-trie
optimization? I think we've tried in the past, mostly gave up and
"fixed" by caching the state in the socket local storage.

Also, fixed 8/16/32/64 prefixes seems like a big limitation? At least if
I were to store ipv4 from the classless (cidr) world..

The RANGE rule is determined by two binary values: minimum and maximum, treated
as unsigned integers of appropriate size:

                size               size
         <----------------> <---------------->
    ... |  min rule value  |  max rule value  | ...

We only allow the 8, 16, 32, and 64-bit for RANGE rules.

That seems useful. I was thinking about similar 'rangemap' where
we can effectively store and lookup [a,b] ranges. Might be useful
in firewalls for storing lists of ports efficiently. So why not
a separate map instead? Are you able to share a lot of code with
the general matching map?

The MATCH rule is determined by one binary value, and is basically the same as
(X,sizeof(X)*8) PREFIX rule, but can be processed a bit faster:

                size
         <---------------->
    ... |    rule value    | ...

To speed up processing all the rules, including the prefix field, should be
stored in host byte order, and all elements in network byte order. 16-byte
fields are stored as {lo,hi}—lower eight bytes, then higher eight bytes.

For elements only values are stored.

Can these optimization be auto-magically derived whenever PREFIX map
with the right values is created? Why put this decision on the users?

To simplify definition of key structures, the BPF_WILDCARD_DESC_{1,2,3,4,5}
macros should be used. For example, one can define an IPv4 4-tuple keys as
follows:

    BPF_WILDCARD_DESC_4(
         capture4_wcard,
         BPF_WILDCARD_RULE_PREFIX, __u32, saddr,
         BPF_WILDCARD_RULE_PREFIX, __u32, daddr,
         BPF_WILDCARD_RULE_RANGE, __u16, sport,
         BPF_WILDCARD_RULE_RANGE, __u16, dport
    );

This macro will define the following structure:

    struct capture4_wcard_key {
         __u32 type;
         __u32 priority;
         union {
             struct {
                     __u32 saddr;
                     __u32 saddr_prefix;
                     __u32 daddr;
                     __u32 daddr_prefix;
                     __u16 sport_min;
                     __u16 sport_max;
                     __u16 dport_min;
                     __u16 dport_max;
             } __packed rule;
             struct {
                     __u32 saddr;
                     __u32 daddr;
                     __u16 sport;
                     __u16 dport;
             } __packed;
         };
    } __packed;

Here type field should contain either BPF_WILDCARD_KEY_RULE or
BPF_WILDCARD_KEY_ELEM so that kernel can differentiate between rules and
elements. The rule structure is used to define (and lookup) rules, the unnamed
structure can be used to specify elements when matching them with rules.

In order to simplify definition of a corresponding struct wildcard_desc, the
BPF_WILDCARD_DESC_* macros will create yet another structure:

    struct capture4_wcard_desc {
         __uint(n_rules, 4);
         struct {
                 __uint(type, BPF_WILDCARD_RULE_PREFIX);
                 __uint(size, sizeof(__u32));
         } saddr;
         struct {
                 __uint(type, BPF_WILDCARD_RULE_PREFIX);
                 __uint(size, sizeof(__u32));
         } daddr;
         struct {
                 __uint(type, BPF_WILDCARD_RULE_RANGE);
                 __uint(size, sizeof(__u16));
         } sport;
         struct {
                 __uint(type, BPF_WILDCARD_RULE_RANGE);
                 __uint(size, sizeof(__u16));
         } dport;
    };

This structure can be used in a (BTF) map definition as follows:

     __type(wildcard_desc, struct capture4_wcard_desc);

Then libbpf will create a corresponding struct wildcard_desc and pass it to
kernel in bpf_attr using new map_extra_data/map_extra_data_size fields.

[..]

The map implementation allows users to specify which algorithm to use to store
rules and lookup packets. Currently, three algorithms are supported:

   * Brute Force (suitable for map sizes of about 32 or below elements)

   * Tuple Merge (a variant of the Tuple Merge algorithm described in the
     "TupleMerge: Fast Software Packet Processing for Online Packet
Classification" white paper, see https://nonsns.github.io/paper/rossi19ton.pdf.
     The Tuple Merge algorithm is not protected by any patents.)

* Static Tuple Merge (a variant of Tuple Merge where a set of lookup tables
     is directly provided by a user)

As a user that has no clue how this map works, how do I decide which
algorithm to use? What I like with the current maps is that they don't
leak too much of their inner state. These controls seems a bit low
level?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux