On Fri, Oct 11, 2019 at 01:24:52PM +0200, Phil Sutter wrote: > Hi, > > On Fri, Oct 11, 2019 at 11:28:23AM +0200, Pablo Neira Ayuso wrote: > [...] > > You could also just parse the ruleset twice in userspace, once to > > calculate the cache you need and another to actually create the > > transaction batch and push it into the kernel. That's a bit poor man > > approach, but it might work. You would need to invoke > > xtables_restore_parse() twice. > > The problem with parsing twice is having to cache input which may be > huge for xtables-restore. > > On Fri, Oct 11, 2019 at 12:20:52PM +0200, Pablo Neira Ayuso wrote: > > On Fri, Oct 11, 2019 at 12:09:11AM +0200, Phil Sutter wrote: > > [...] > > > Maybe we could go with a simpler solution for now, which is to check > > > kernel genid again and drop the local cache if it differs from what's > > > stored. If it doesn't, the current cache is still up to date and we may > > > just fetch what's missing. Or does that leave room for a race condition? > > > > My concern with this approach is that, in the dynamic ruleset update > > scenarios, assuming very frequent updates, you might lose race when > > building the cache in stages. Hence, forcing you to restart from > > scratch in the middle of the transaction handling. > > In a very busy environment there's always trouble, simply because we > can't atomically fetch ruleset from kernel and adjust and submit our > batch. Dealing with that means we're back at xtables-lock. > > > I prefer to calculate the cache that is needed in one go by analyzing > > the batch, it's simpler. Note that we might lose race still since > > kernel might tell us we're working on an obsolete generation number ID > > cache, forcing us to restart. > > My idea for conditional cache reset is based on the assumption that > conflicts are rare and we want to optimize for non-conflict case. So > core logic would be: > > 1) fetch kernel genid into genid_start > 2) if cache level > NFT_CL_NONE and cache genid != genid_start: > 2a) drop local caches > 2b) set cache level to NFT_CL_NONE > 3) call cache fetchers based on cache level and desired level > 4) fetch kernel genid into genid_end > 5) if genid_start != genid_end goto 1 > > So this is basically the old algorithm but with (2) added. What do you > think? Please, make testcases to validate that races don't happen. Debugging cache inconsistencies is not easy, that's why I like the idea of calculating the cache first, then build it in one go. I'm fine with starting with a more simple approach in the short term. Note that reports from users on these cache inconsistency problems are usually sparse, which is usually a bit frustrating. I understand a larger rework might to accomodate the more simple approach will take more time.