On Tue, Jul 23, 2024 at 02:19:25PM +0200, Pablo Neira Ayuso wrote: > On Tue, Jul 23, 2024 at 01:56:46PM +0200, Phil Sutter wrote: > > Some digging and lots of printf's later: > > > > On Mon, Jul 22, 2024 at 11:34:01PM +0200, Pablo Neira Ayuso wrote: > > [...] > > > I can reproduce it: > > > > > > # nft -i > > > nft> add table inet foo > > > nft> add chain inet foo bar { type filter hook input priority filter; } > > > nft> add rule inet foo bar accept > > > > This bumps cache->flags from 0 to 0x1f (no cache -> NFT_CACHE_OBJECT). > > > > > nft> insert rule inet foo bar index 0 accept > > > > This adds NFT_CACHE_RULE_BIT and NFT_CACHE_UPDATE, cache is updated (to > > fetch rules). > > > > > nft> add rule inet foo bar index 0 accept > > > > No new flags for this one, so the code hits the 'genid == cache->genid + > > 1' case in nft_cache_is_updated() which bumps the local genid and skips > > a cache update. The new rule then references the cached copy of the > > previously commited one which still does not have a handle. Therefore > > link_rules() does it's thing for references to uncommitted rules which > > later fails. > > > > Pablo: Could you please explain the logic around this cache->genid > > increment? Commit e791dbe109b6d ("cache: recycle existing cache with > > incremental updates") is not clear to me in this regard. How can the > > local process know it doesn't need whatever has changed in the kernel? > > The idea is to use the ruleset generation ID as a hint to infer if the > existing cache can be recycled, to speed up incremental updates. This > is not sufficient for the index cache, see below. I have to revisit e791dbe109b6d, another process could race to bump the generation ID incrementally and I incorrectly assumed cache is consistent.