Re: [nf-next PATCH v3 3/3] netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests

Florian Westphal <fw@xxxxxxxxx> · Thu, 19 Oct 2023 13:59:09 +0200

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> On Thu, Oct 19, 2023 at 01:33:47PM +0200, Phil Sutter wrote:
> > Rule reset is not concurrency-safe per-se, so multiple CPUs may reset
> > the same rule at the same time. At least counter and quota expressions
> > will suffer from value underruns in this case.
> > 
> > Prevent this by introducing dedicated locking callbacks for nfnetlink
> > and the asynchronous dump handling to serialize access.
> > 
> > Signed-off-by: Phil Sutter <phil@xxxxxx>
> > ---
> > Changes since v2:
> > - Keep local variable 'nft_net' in nf_tables_getrule_reset()
> > - No need for local variable 'family' in same function (used only once
> >   after all the churn)
> > ---
> >  net/netfilter/nf_tables_api.c | 74 ++++++++++++++++++++++++++++-------
> >  1 file changed, 60 insertions(+), 14 deletions(-)
> > 
> > diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> > index 584d3b204372..fbb688c9903c 100644
> > --- a/net/netfilter/nf_tables_api.c
> > +++ b/net/netfilter/nf_tables_api.c
> [...]
> > +static int nf_tables_dumpreset_rules(struct sk_buff *skb,
> > +				     struct netlink_callback *cb)
> > +{
> > +	struct nftables_pernet *nft_net = nft_pernet(sock_net(skb->sk));
> > +	int ret;
> > +
> > +	mutex_lock(&nft_net->commit_mutex);
> > +	ret = nf_tables_dump_rules(skb, cb);
> > +	mutex_unlock(&nft_net->commit_mutex);
> 
> NACK.
> 
> This just mitigates the problem we are discussing, when there is an
> interference with an ongoing transaction.

It resolves corrupting the internal state when two parallel resets
are done.

If you believe that we have to make entire dump consistent even
when reset flag is given I see no choice but to completely remove
reset-from-dump support.

What is you suggested solution?

AFAICS, with this series, userspace can, in theory, merge partial
dumps into consistent output by manually collecting the partial
dumps.

That said, I think its not very realistic that userspace will
get this right.

That leaves: userspace does a dump (without reset), and if that
was consistent walk it and do a per-handle get-with-reset request
for each rule, then update the (not-yet-printed) dump with the
newly obtained stateful results.