Hi Phil, On Wed, Mar 04, 2020 at 03:13:34AM +0100, Phil Sutter wrote: > Hi Pablo, > > On Tue, Mar 03, 2020 at 09:55:54PM +0100, Pablo Neira Ayuso wrote: > > On Tue, Mar 03, 2020 at 02:02:52AM +0100, Phil Sutter wrote: > > > Hi Pablo, > > > > > > On Mon, Mar 02, 2020 at 08:19:30PM +0100, Pablo Neira Ayuso wrote: > > > > On Mon, Mar 02, 2020 at 06:53:55PM +0100, Phil Sutter wrote: > > > > > iptables-nft-restore calls nft_action(h, NFT_COMPAT_COMMIT) for each > > > > > COMMIT line in input. When restoring a dump containing multiple large > > > > > tables, chances are nft_rebuild_cache() has to run multiple times. > > > > It is true that chances that this code runs multiple times since the > > new fine-grain caching logic is in place. > > AFAICT, this is not related to granularity of caching logic. The crux is > that your fix of Florian's concurrency fix in commit f6ad231d698c7 > ("nft: keep original cache in case of ERESTART") ignores the fact that > cache may have to be rebuilt multiple times. I wasn't aware of it > either, but knowing that each COMMIT line causes a COMMIT internally > makes it obvious. Your patch adds code to increment cache_index but none > to reset it to zero. Thanks for explaining. [...] > > Would this patch still work after this series are applied: > > > > https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=151404 > > > > That is working and passing tests. It is just missing the code to > > restore the fine grain dumping, that should be easy to add. > > > > That logic will really reduce the chances to exercise all this cache > > dump / cache cancel. Bugs in this cache consistency code is usually > > not that easy to trigger and usually hard to fix. > > > > I just think it would be a pity if that work ends up in the trash can. > > I didn't review those patches yet, but from a quick glance it doesn't > seem to touch the problematic code around __nft_flush_cache(). Let's > make a deal: You accept my fix for the existing cache logic and I'll in > return fix your series if necessary and at least find out what needs to > be done so it doesn't cause a performance regression. OK, deal :-)