Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Wed, Oct 04, 2023 at 10:46:23AM +0200, Florian Westphal wrote: > > I don't think the dump paths were ever designed to munge existing > > data. For those parts that provide full/consistent serialization, > > e.g. single rule is fetched, its fine. > > > > But whenever we may yield in between successive partial dumps, its not. > > Yes, netlink dumps do not provide atomic listing semantics, that is > why there is the generation ID from userspace. I am trying to think of > a way to achieve this from the kernel but I did not come with any so > far. > > From userspace, it should be possible to keep a generation ID per > object in the cache, so the cache becomes a collection of objects of > different generations, then when listing, just take the objects that > are still current. Or it should be possible to disambiguate this from > userspace with the u64 handle, ie. keep stale objects around and merge > them to new ones when fetching the counters. > > But this is lots of work from userspace, and it might get more > complicated if more transactions happen while retrying (see test > script from Phil where transaction happens in an infinite loop). > > This is the other open issue we have been discussing lately :) Right, there is an amalgamation of different issues, lets not get swamped trying to solve everything at once :-) I've collected a few patches and pushed them out to :testing. Stresstest is still running but so far it looks good to me. The updated audit selftest passes, as does Xins sctp test case. I expect to do the PR in the next few hours or so. I will followup on nf-next once upstream has pulled the PR into net and did a net -> net-next merge, which might take a while. Followup as in "send rebased patches that get rid of the async gc in nft_set_rbtree". Let me know if there is anything else I can help with. Phil, I know all of this sucks from your point of view, this is also my fault, I did not pay too close attention to the reset patches, and, to be clear, even if I would have its likely I would have missed all of the implications of reusing the dump infrastructure for this. I have not applied Pablos patch to revert the "timeout reset" because its relatively fresh compared to the other patches in the batch (and the batch is already large enough). But I do agree with having a separate facility for timeout resets (timeout updates) would be better. I also think we need to find a different strategy for the dump-and-reset part when the reset could be interrupted by a transaction. ATM userspace would have to add a userspace lock like iptables-legacy to prevent any generation id changes while a "reset dump" is happening and I hope we can all agree that this is something we definitely do not want :-)