Re: [PATCH] netfilter: nf_tables: restrict expression reduction to first expression

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Wed, 18 May 2022 14:21:57 +0200

On Wed, May 18, 2022 at 01:40:21PM +0200, Phil Sutter wrote:
> On Wed, May 18, 2022 at 01:01:50PM +0200, Pablo Neira Ayuso wrote:
> > On Wed, May 18, 2022 at 12:51:00PM +0200, Phil Sutter wrote:
> > > Hi,
> > > 
> > > On Wed, May 18, 2022 at 12:08:42PM +0200, Pablo Neira Ayuso wrote:
> > > > Either userspace or kernelspace need to pre-fetch keys inconditionally
> > > > before comparisons for this to work. Otherwise, register tracking data
> > > > is misleading and it might result in reducing expressions which are not
> > > > yet registers.
> > > > 
> > > > First expression is guaranteed to be evaluated always, therefore, keep
> > > > tracking registers and restrict reduction to first expression.
> > > > 
> > > > Fixes: b2d306542ff9 ("netfilter: nf_tables: do not reduce read-only expressions")
> > > > Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> > > > ---
> > > > @Phil, you mentioned about a way to simplify this patch, I don't see how,
> > > > just let me know.
> > > 
> > > Not a big one. Instead of:
> > > 
> > > |	if (nft_expr_reduce(&track, expr)) {
> > > |		if (reduce) {
> > > |			reduce = false;
> > > |			expr = track.cur;
> > > |			continue;
> > > |		}
> > > |	} else if (reduce) {
> > > |		reduce = false;
> > > |	}
> > > 
> > > One could do:
> > > 
> > > |	if (nft_expr_reduce(&track, expr) && reduce) {
> > > |		reduce = false;
> > > |		expr = track.cur;
> > > |		continue;
> > > |	}
> > > |	reduce = false;
> > 
> > I'll send v2 using this idiom.
> > 
> > > Regarding later pre-fetching, one should distinguish between expressions
> > > that (may) set verdict register and those that don't. There are pitfalls
> > > though, e.g. error conditions handled that way.
> > > 
> > > Maybe introduce a new nft_expr_type field and set reduce like so:
> > > 
> > > | reduce = reduce && expr->ops->type->reduce;
> > 
> > Could you elaborate?
> 
> Well, an expression which may set verdict register to NFT_BREAK should
> prevent reduction of later expressions in same rule as it may stop rule
> evaluation at run-time. This is obvious for nft_cmp, but nft_meta is
> also a candidate: NFT_META_IFTYPE causes NFT_BREAK if pkt->skb->dev is
> NULL. The optimizer must not assume later expressions are evaluated.

How many other expression are breaking when fetching the key?

> A first step might be said nft_expr_type field indicating a given
> expression might stop expression evaluation. Therefore:
> 
> | reduce = reduce && expr->ops->type->reduce;
> 
> would continue expression reduction if not already stopped and the
> current expression doesn't end it.
> 
> Taking nft_meta as example again:
> 
> * Behaviour changes based on nft_expr_type::select_ops result
> * Some keys are guaranteed to not stop expression evaluation:
>   NFT_META_LEN for instance will always just fetch skb->len. So
>   introduce a callback instead:
>
> | bool nft_expr_ops::may_break(const struct nft_expr *expr);
>
> Then "ask" the expression whether it may change verdict register:
> 
> | reduce = reduce && expr->ops->may_break(expr);
> 
> With nft_meta_get_ops, we'd have:
> 
> | bool nft_meta_get_may_break(const struct nft_expr *expr)
> | {
> | 	switch (nft_expr_priv(expr)->key) {
> | 	case NFT_META_LEN:
> | 	case NFT_META_PROTOCOL::
> | 	[...]
> | 		return false;
> | 	case NFT_META_IFTYPE:
> | 	[...]
> | 		return true;
> | 	}
> | }

And simply remove that NFT_BREAK and set a value that will not ever
match via nft_cmp?

> Another thing about your proposed patch: Expressions may update
> registers even if not reduced. Could that upset later reduction
> decision? E.g.:
> 
> | ip saddr 1.0.0.1 ip daddr 2.0.0.2 accept
> | ip daddr 3.0.0.3 accept
> 
> Code no longer allows the first rule's 'ip daddr' expression to be
> reduced (no matter what's in registers already), but it's existence
> causes reduction of the second rule's 'ip daddr' expression, right?

We cannot make assumptions on ip daddr because there is a cmp right
before (to test for ip saddr 1.0.0.1), unless keys are inconditionally
prefetched.