Re: [PATCH nf-next v4 4/5] netfilter: nf_tables: switch trans_elem to real flex array

Florian Westphal <fw@xxxxxxxxx> · Wed, 13 Nov 2024 12:04:05 +0100

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> I'm making another pass on this series, a few thing I would like to
> ask, see below.
> 
> On Thu, Nov 07, 2024 at 06:44:08PM +0100, Florian Westphal wrote:
> > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> > index bdf5ba21c76d..e96e538fe2eb 100644
> > --- a/net/netfilter/nf_tables_api.c
> > +++ b/net/netfilter/nf_tables_api.c
> > @@ -25,6 +25,7 @@
> >  
> >  #define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-"))
> >  #define NFT_SET_MAX_ANONLEN 16
> > +#define NFT_MAX_SET_NELEMS ((2048 - sizeof(struct nft_trans_elem)) / sizeof(struct nft_trans_one_elem))
> 
> This NFT_MAX_SET_NELEMS is to stay in a specific kmalloc-X?
> 
> What is the logic behind this NFT_MAX_SET_NELEMS?

I want to avoid making huge kmalloc requests, plus avoid huge krealloc
overhead.

I think that kmalloc-2048 slab is a good fit.
I can add a comment, or increase to kmalloc-4096 but I'd prefer to
not go over that, since kmalloc allocations > 1 page are more prone
to allocation failure.

> >  unsigned int nf_tables_net_id __read_mostly;
> >  
> > @@ -391,6 +392,69 @@ static void nf_tables_unregister_hook(struct net *net,
> >  	return __nf_tables_unregister_hook(net, table, chain, false);
> >  }
> >  
> > +static bool nft_trans_collapse_set_elem_allowed(const struct nft_trans_elem *a, const struct nft_trans_elem *b)
> > +{
> > +	return a->set == b->set && a->bound == b->bound && a->nelems < NFT_MAX_SET_NELEMS;
> 
> I think this a->bound == b->bound check defensive.
> 
> This code is collapsing only two consecutive transactions, the one at
> the tail (where nelems > 1) and the new transaction (where nelems ==
> 1).

Yes.

> bound state should only change in case there is a NEWRULE transaction
> in between.

Yes.

> I am trying to find a error scenario where a->bound == b->bound
> evaluates false. I considered the following:
> 
>    newelem -> newrule -> newelem
> 
> where newrule has these expressions:
> 
>    lookup -> error
> 
> in this case, newrule error path is exercised:
> 
>    nft_rule_expr_deactivate(&ctx, rule, NFT_TRANS_PREPARE_ERROR);
> 
> this calls nf_tables_deactivate_set() that calls
> nft_set_trans_unbind(), then a->bound is restored to false. Rule is
> released and no transaction is added.
> 
> Because if this succeeds:
> 
>    newelem -> newrule -> newelem
> 
> then no element collapsing can happen, because we only collapse what
> is at the tail.
> 
> TLDR; Check does not harm, but it looks unlikely to happen to me.

Yes, its defensive check.  I could add a comment.
The WARN_ON_ONCE for trans->nelems != 1 exists for same reason.

> > +}
> > +
> > +static bool nft_trans_collapse_set_elem(struct nftables_pernet *nft_net,
> > +					struct nft_trans_elem *tail,
> > +					struct nft_trans_elem *trans,
> > +					gfp_t gfp)
> > +{
> > +	unsigned int nelems, old_nelems = tail->nelems;
> > +	struct nft_trans_elem *new_trans;
> > +
> > +	if (!nft_trans_collapse_set_elem_allowed(tail, trans))
> > +		return false;
> > +
> > +	if (WARN_ON_ONCE(trans->nelems != 1))
> > +		return false;
> > +
> > +	if (check_add_overflow(old_nelems, trans->nelems, &nelems))
> > +		return false;
> > +
> > +	/* krealloc might free tail which invalidates list pointers */
> > +	list_del_init(&tail->nft_trans.list);
> > +
> > +	new_trans = krealloc(tail, struct_size(tail, elems, nelems), gfp);
> > +	if (!new_trans) {
> > +		list_add_tail(&tail->nft_trans.list, &nft_net->commit_list);
> > +		return false;
> > +	}
> > +
> > +	INIT_LIST_HEAD(&new_trans->nft_trans.list);
> 
> This initialization is also defensive, this element is added via
> list_add_tail().

Yes, the first arg to list_add(_tail) can live without initialisation.