Hi Pablo, On Sat, 23 Nov 2019 21:01:08 +0100 Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > Hi Stefano, > > On Fri, Nov 22, 2019 at 02:40:00PM +0100, Stefano Brivio wrote: > [...] > > diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h > > index bb9b049310df..f8dbeac14898 100644 > > --- a/include/uapi/linux/netfilter/nf_tables.h > > +++ b/include/uapi/linux/netfilter/nf_tables.h > > @@ -48,6 +48,7 @@ enum nft_registers { > > > > #define NFT_REG_SIZE 16 > > #define NFT_REG32_SIZE 4 > > +#define NFT_REG32_COUNT (NFT_REG32_15 - NFT_REG32_00 + 1) > > > > /** > > * enum nft_verdicts - nf_tables internal verdicts > > @@ -275,6 +276,7 @@ enum nft_rule_compat_attributes { > > * @NFT_SET_TIMEOUT: set uses timeouts > > * @NFT_SET_EVAL: set can be updated from the evaluation path > > * @NFT_SET_OBJECT: set contains stateful objects > > + * @NFT_SET_SUBKEY: set uses subkeys to map intervals for multiple fields > > */ > > enum nft_set_flags { > > NFT_SET_ANONYMOUS = 0x1, > > @@ -284,6 +286,7 @@ enum nft_set_flags { > > NFT_SET_TIMEOUT = 0x10, > > NFT_SET_EVAL = 0x20, > > NFT_SET_OBJECT = 0x40, > > + NFT_SET_SUBKEY = 0x80, > > }; > > > > /** > > @@ -309,6 +312,17 @@ enum nft_set_desc_attributes { > > }; > > #define NFTA_SET_DESC_MAX (__NFTA_SET_DESC_MAX - 1) > > > > +/** > > + * enum nft_set_subkey_attributes - subkeys for multiple ranged fields > > + * > > + * @NFTA_SET_SUBKEY_LEN: length of single field, in bits (NLA_U32) > > + */ > > +enum nft_set_subkey_attributes { > > Missing NFTA_SET_SUBKEY_UNSPEC here. > > Not a problem if nla_parse_nested*() is not used as in your case, > probably good for consistency, in case there is a need for using such > function in the future. > > > + NFTA_SET_SUBKEY_LEN, > > + __NFTA_SET_SUBKEY_MAX > > +}; > > +#define NFTA_SET_SUBKEY_MAX (__NFTA_SET_SUBKEY_MAX - 1) > > + > > /** > > * enum nft_set_attributes - nf_tables set netlink attributes > > * > > @@ -327,6 +341,7 @@ enum nft_set_desc_attributes { > > * @NFTA_SET_USERDATA: user data (NLA_BINARY) > > * @NFTA_SET_OBJ_TYPE: stateful object type (NLA_U32: NFT_OBJECT_*) > > * @NFTA_SET_HANDLE: set handle (NLA_U64) > > + * @NFTA_SET_SUBKEY: subkeys for multiple ranged fields (NLA_NESTED) > > */ > > enum nft_set_attributes { > > NFTA_SET_UNSPEC, > > @@ -346,6 +361,7 @@ enum nft_set_attributes { > > NFTA_SET_PAD, > > NFTA_SET_OBJ_TYPE, > > NFTA_SET_HANDLE, > > + NFTA_SET_SUBKEY, > > Could you use NFTA_SET_DESC instead for this? The idea is to add the > missing front-end code to parse this new attribute and store the > subkeys length in set->desc.klen[], hence nft_pipapo_init() can just > use the already parsed data. Logically, I think it makes sense. I'll try to implement this in nft and libnftnl and see if some fundamental issue pops up there. > I think this will simplify the code that I'm seeing in > nft_pipapo_init() a bit since not netlink parsing will be required. I don't think it makes a real difference there, because the actual parsing parts are rather limited: nla_for_each_nested(attr, nla[NFTA_SET_SUBKEY], rem) { [...] if (nla_len(attr) != sizeof(klen) || nla_type(attr) != NFTA_SET_SUBKEY_LEN) return -EINVAL; } [...] nla_for_each_nested(attr, nla[NFTA_SET_SUBKEY], rem) { klen = ntohl(nla_get_be32(attr)); [...] } the rest is validations (specific for this set type): nla_for_each_nested(attr, nla[NFTA_SET_SUBKEY], rem) { if (++field_count >= NFT_PIPAPO_MAX_FIELDS) return -EINVAL; [...] } [...] nla_for_each_nested(attr, nla[NFTA_SET_SUBKEY], rem) { [...] if (!klen || klen % NFT_PIPAPO_GROUP_BITS) goto out_free; if (klen > NFT_PIPAPO_MAX_BITS) goto out_free; [...] } and calculations (also specific): nla_for_each_nested(attr, nla[NFTA_SET_SUBKEY], rem) { if (++field_count >= NFT_PIPAPO_MAX_FIELDS) [...] } nla_for_each_nested(attr, nla[NFTA_SET_SUBKEY], rem) { [...] priv->groups += f->groups = klen / NFT_PIPAPO_GROUP_BITS; priv->width += round_up(klen / BITS_PER_BYTE, sizeof(u32)); [...] } that we would still need. > I'm attaching a sketch patch, including also the use of NFTA_LIST_ELEM: > > NFTA_SET_DESC > NFTA_SET_DESC_SIZE > NFTA_SET_DESC_SUBKEY > NFTA_LIST_ELEM > NFTA_SET_SUBKEY_LEN > NFTA_LIST_ELEM > NFTA_SET_SUBKEY_LEN > ... > > Just in there's a need for more fields to describe the subkey in the > future, it's just more boilerplate code for the future extensibility. Thanks! I'll play with it and see if I can fit all the pieces. > Another suggestion is to rename NFT_SET_SUBKEY to NFT_SET_CONCAT, to > signal the kernel that userspace wants a datastructure that knows how > to deal with concatenations. Although concatenations can be done by > hashtable already, this flags is just interpreted by the kernel as a > hint on what kind of datastructure would fit better for what is > needed. The combination of the NFT_SET_INTERVAL and the NFT_SET_CONCAT > (if you're fine with the rename, of course) is what will kick in > pipapo to be used. I think that NFT_SET_CONCAT as you propose is conceptually a better fit. I'm worried about the confusion this might generate for other set implementations. That is, a reasonable expectation is that userspace passes NFT_SET_CONCAT whenever there's a concatenation, and hash implementations support sets with that flag, too, so I would add it to the supported feature flags of hash types, and it wouldn't be there for rbtree. Right now, that won't break anything: the flag might or might not be present depending on userspace version, and selection of hash types would proceed as usual. But I'm worried that we might miss this subtlety in the future and break concatenation support for older userspace versions. Another idea could be that we get rid of this flag altogether: if we move "subkeys" to set->desc, the ->estimate() functions of rbtree and pipapo can check for those and refuse or allow set selection accordingly. I have no idea yet if this introduces further complexity for nft, because there we would need to decide how to create start/end elements depending on the existing set description instead of using a single flag. I can give it a try if it makes sense. -- Stefano