Re: [RFC PATCH] bpf: introduce new bpf map type BPF_MAP_TYPE_WILDCARD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Anton,

On Wed, Sep 07, 2022 at 10:01:40AM +0200, Anton Protopopov wrote:
> Add a new map, BPF_MAP_TYPE_WILDCARD, which provides means to implement generic
> online packet classification. Here "online" stands for "fast lookups and fast
> updates", and "generic" means that a user can create maps with custom lookup
> schemes—different numbers of fields and different interpretation of individual
> fields (prefix bitmasks, ranges, and direct matches).
> 
> In particular, in Cilium we have several use cases for such a map:
> 
>   * XDP Prefilter is a simple XDP-based DDoS mitigation system provided by
>     Cilium. At the moment it only supports filtering by source CIDRs. It would
>     benefit from using this new map, as it allows to utilize wildcard rules
>     without big penalty comparing to one hash or LPM lookup utilized now.
> 
>   * XDP Packet Recorder (see https://lpc.events/event/11/contributions/953/)
> 
>   * K8S and Cilium Network Policies: as of k8s 1.25 port ranges are considered
>     to be a stable feature, and this map allows to implement this easily (and
>     also to provide more sophisticated filtering for Cilium Network Policies)
> 
> Keys for wildcard maps are defined using the struct wildcard_key structure.
> Besides the type field, it contains the data field of an arbitrary size. To
> educate map about what's contained inside the data field, two additional
> structures are used. The first one, struct wildcard_desc, also of arbitrary
> size, tells how many fields are contained inside data, and the struct
> wildcard_rule_desc structure defines how individual fields look like.
> 
> Fields (rules) can be of three types: BPF_WILDCARD_RULE_{PREFIX,RANGE,MATCH}.
> The PREFIX rule means that inside data we have a binary value and a binary
> (prefix) mask:
> 
>                size             u32
>         <----------------> <----------->
>    ... |    rule value    |   prefix   | ...
> 
> Here rule value is a binary value, e.g., 123.324.128.0, and prefix is a u32 bit
> variable; we only use lower 8 bits of it. We allow 8, 16, 32, 64, and 128 bit
> values for PREFIX rules.
> 
> The RANGE rule is determined by two binary values: minimum and maximum, treated
> as unsigned integers of appropriate size:
> 
>                size               size
>         <----------------> <---------------->
>    ... |  min rule value  |  max rule value  | ...
> 
> We only allow the 8, 16, 32, and 64-bit for RANGE rules.
> 
> The MATCH rule is determined by one binary value, and is basically the same as
> (X,sizeof(X)*8) PREFIX rule, but can be processed a bit faster:
> 
>                size
>         <---------------->
>    ... |    rule value    | ...
> 
> To speed up processing all the rules, including the prefix field, should be
> stored in host byte order, and all elements in network byte order. 16-byte
> fields are stored as {lo,hi}—lower eight bytes, then higher eight bytes.
> 
> For elements only values are stored.
> 
> To simplify definition of key structures, the BPF_WILDCARD_DESC_{1,2,3,4,5}
> macros should be used. For example, one can define an IPv4 4-tuple keys as
> follows:
> 
>    BPF_WILDCARD_DESC_4(
>         capture4_wcard,
>         BPF_WILDCARD_RULE_PREFIX, __u32, saddr,
>         BPF_WILDCARD_RULE_PREFIX, __u32, daddr,
>         BPF_WILDCARD_RULE_RANGE, __u16, sport,
>         BPF_WILDCARD_RULE_RANGE, __u16, dport
>    );
> 
> This macro will define the following structure:
> 
>    struct capture4_wcard_key {
>         __u32 type;
>         __u32 priority;
>         union {
>             struct {
>                     __u32 saddr;
>                     __u32 saddr_prefix;
>                     __u32 daddr;
>                     __u32 daddr_prefix;
>                     __u16 sport_min;
>                     __u16 sport_max;
>                     __u16 dport_min;
>                     __u16 dport_max;
>             } __packed rule;
>             struct {
>                     __u32 saddr;
>                     __u32 daddr;
>                     __u16 sport;
>                     __u16 dport;
>             } __packed;
>         };
>    } __packed;
> 
> Here type field should contain either BPF_WILDCARD_KEY_RULE or
> BPF_WILDCARD_KEY_ELEM so that kernel can differentiate between rules and
> elements. The rule structure is used to define (and lookup) rules, the unnamed
> structure can be used to specify elements when matching them with rules.
> 
> In order to simplify definition of a corresponding struct wildcard_desc, the
> BPF_WILDCARD_DESC_* macros will create yet another structure:
> 
>    struct capture4_wcard_desc {
>         __uint(n_rules, 4);
>         struct {
>                 __uint(type, BPF_WILDCARD_RULE_PREFIX);
>                 __uint(size, sizeof(__u32));
>         } saddr;
>         struct {
>                 __uint(type, BPF_WILDCARD_RULE_PREFIX);
>                 __uint(size, sizeof(__u32));
>         } daddr;
>         struct {
>                 __uint(type, BPF_WILDCARD_RULE_RANGE);
>                 __uint(size, sizeof(__u16));
>         } sport;
>         struct {
>                 __uint(type, BPF_WILDCARD_RULE_RANGE);
>                 __uint(size, sizeof(__u16));
>         } dport;
>    };
> 
> This structure can be used in a (BTF) map definition as follows:
> 
>     __type(wildcard_desc, struct capture4_wcard_desc);
> 
> Then libbpf will create a corresponding struct wildcard_desc and pass it to
> kernel in bpf_attr using new map_extra_data/map_extra_data_size fields.
> 
> The map implementation allows users to specify which algorithm to use to store
> rules and lookup packets. Currently, three algorithms are supported:
> 
>   * Brute Force (suitable for map sizes of about 32 or below elements)
> 
>   * Tuple Merge (a variant of the Tuple Merge algorithm described in the
>     "TupleMerge: Fast Software Packet Processing for Online Packet
>     Classification" white paper, see https://nonsns.github.io/paper/rossi19ton.pdf.
>     The Tuple Merge algorithm is not protected by any patents.)
> 
>   * Static Tuple Merge (a variant of Tuple Merge where a set of lookup tables
>     is directly provided by a user)
> 
> To select a specific algorithm one should set a flag in the map_extra field,
> see examples below.
> 
> Example 1. The following defines a brute force map to match IPv4
> source/destination CIDRs:
> 
>     BPF_WILDCARD_DESC_2(
>             capture4_l3,
>             BPF_WILDCARD_RULE_PREFIX, __u32, saddr,
>             BPF_WILDCARD_RULE_PREFIX, __u32, daddr,
>     );
> 
>     struct {
>             __uint(type, BPF_MAP_TYPE_WILDCARD);
>             __type(key, struct capture4_l3_key);
>             __type(value, __u64);
>             __uint(max_entries, 16);
>             __uint(map_flags, BPF_F_NO_PREALLOC);
>             __uint(map_extra, BPF_WILDCARD_F_ALGORITHM_BF);
>             __type(wildcard_desc, struct capture4_l3_desc);
>     } filter_v4_bf __section(".maps");
> 
> Example 2. The only change requred for the previous example to use Tuple Merge
> is to select a different algorithm:
> 
>     struct {
>             __uint(type, BPF_MAP_TYPE_WILDCARD);
>             __type(key, struct capture4_l3_key);
>             __type(value, __u64);
>             __uint(max_entries, 16);
>             __uint(map_flags, BPF_F_NO_PREALLOC);
>             __uint(map_extra, BPF_WILDCARD_F_ALGORITHM_TM);
>             __type(wildcard_desc, struct capture4_l3_desc);
>     } filter_v4_tm __section(".maps");
> 
> Example 3. Let's update the previous example to use Static Tuple Merge:
> 
>     BPF_WILDCARD_TM_OPTS(
>             capture4_l3,
>             BPF_WILDCARD_TM_TABLES_3(saddr, 16, 0, 16);
>             BPF_WILDCARD_TM_TABLES_3(daddr, 16, 16, 0);
>     );
> 
>     struct {
>             __uint(type, BPF_MAP_TYPE_WILDCARD);
>             __type(key, struct capture4_l3_key);
>             __type(value, __u64);
>             __uint(max_entries, 100000);
>             __uint(map_flags, BPF_F_NO_PREALLOC);
>             __uint(map_extra, BPF_WILDCARD_F_ALGORITHM_TM |
>                               BPF_WILDCARD_F_TM_STATIC_POOL |
>                               BPF_WILDCARD_F_TM_POOL_LIST);
>             __type(wildcard_tm_opts, struct capture4_l3_opts);
>     } filter_v4_bf __section(".maps");
> 
> Here we set new flags to specify that a pool of tables should be used to select
> new tables from (BPF_WILDCARD_F_TM_STATIC_POOL), and that the field
> wildcard_tm_opts contains a list of tables to use (BPF_WILDCARD_F_TM_POOL_LIST,
> an alternative is to use a Cartesian product of arrays provided). The
> capture4_l3_opts is defined by a helper macro BPF_WILDCARD_TM_OPTS, where for
> each field we define a list of prefixes to use.  If a field is missing, then it
> will be always ignored.
> 
> The following changes and are not part of this RFC, but planned to be added before v1:
> 
>   * implement priorities, i.e., users will be able to specify rule priority as
>     u32 and rules with lower priorities will be matched first
> 
>   * implement !BPF_F_NO_PREALLOC: right now we always kalloc both elements and
>     new tables
> 
> Signed-off-by: Anton Protopopov <aspsk@xxxxxxxxxxxxx>
> ---
>  include/linux/bpf_types.h       |    1 +
>  include/uapi/linux/bpf.h        |  127 +++
>  kernel/bpf/Makefile             |    2 +-
>  kernel/bpf/syscall.c            |   51 +-
>  kernel/bpf/wildcard.c           | 1526 +++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h  |  129 ++-
>  tools/lib/bpf/bpf.c             |    8 +
>  tools/lib/bpf/bpf.h             |    5 +-
>  tools/lib/bpf/libbpf.c          |  423 +++++++++
>  tools/lib/bpf/libbpf_internal.h |    5 +-
>  10 files changed, 2269 insertions(+), 8 deletions(-)
>  create mode 100644 kernel/bpf/wildcard.c
> 

I'm interested in the goal of your patch but I'm having a grokking how
the map is actually used.

Is there any chance you could include a selftest or example?

Thanks,
Daniel

[...]



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux