Re: [PATCH nf] netfilter: nf_tables: do not refresh timeout when resetting element

Florian Westphal <fw@xxxxxxxxx> · Wed, 4 Oct 2023 14:48:45 +0200

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> On Wed, Oct 04, 2023 at 10:46:23AM +0200, Florian Westphal wrote:
> > I don't think the dump paths were ever designed to munge existing
> > data.  For those parts that provide full/consistent serialization,
> > e.g. single rule is fetched, its fine.
> > 
> > But whenever we may yield in between successive partial dumps, its not.
> 
> Yes, netlink dumps do not provide atomic listing semantics, that is
> why there is the generation ID from userspace. I am trying to think of
> a way to achieve this from the kernel but I did not come with any so
> far.
>
> From userspace, it should be possible to keep a generation ID per
> object in the cache, so the cache becomes a collection of objects of
> different generations, then when listing, just take the objects that
> are still current. Or it should be possible to disambiguate this from
> userspace with the u64 handle, ie. keep stale objects around and merge
> them to new ones when fetching the counters.
> 
> But this is lots of work from userspace, and it might get more
> complicated if more transactions happen while retrying (see test
> script from Phil where transaction happens in an infinite loop).
>
> This is the other open issue we have been discussing lately :)

Right, there is an amalgamation of different issues, lets not get swamped
trying to solve everything at once :-)

I've collected a few patches and pushed them out to :testing.
Stresstest is still running but so far it looks good to me.

The updated audit selftest passes, as does Xins sctp test case.

I expect to do the PR in the next few hours or so.

I will followup on nf-next once upstream has pulled the PR
into net and did a net -> net-next merge, which might take a
while. Followup as in "send rebased patches that get rid of
the async gc in nft_set_rbtree".

Let me know if there is anything else I can help
with.

Phil, I know all of this sucks from your point of view,
this is also my fault, I did not pay too close attention
to the reset patches, and, to be clear, even if I would have
its likely I would have missed all of the implications
of reusing the dump infrastructure for this.

I have not applied Pablos patch to revert the "timeout reset"
because its relatively fresh compared to the other patches
in the batch (and the batch is already large enough).

But I do agree with having a separate facility for timeout
resets (timeout updates) would be better.

I also think we need to find a different strategy for the
dump-and-reset part when the reset could be interrupted
by a transaction.

ATM userspace would have to add a userspace lock like
iptables-legacy to prevent any generation id changes while
a "reset dump" is happening and I hope we can all agree that
this is something we definitely do not want :-)