Re: [iptables PATCH v2 03/14] xtables: Implement per chain rule cache

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Mon, 17 Dec 2018 22:48:52 +0100

On Mon, Dec 17, 2018 at 10:11:30PM +0100, Phil Sutter wrote:
> On Mon, Dec 17, 2018 at 08:25:24PM +0100, Pablo Neira Ayuso wrote:
> > On Mon, Dec 17, 2018 at 07:24:24PM +0100, Phil Sutter wrote:
> > > Hi Pablo,
> > > 
> > > On Mon, Dec 17, 2018 at 06:42:49PM +0100, Pablo Neira Ayuso wrote:
> > > > On Thu, Dec 13, 2018 at 12:15:56PM +0100, Phil Sutter wrote:
> > > > > Use recently introduced support for rules inside chains in libnftnl to
> > > > > introduce a rule cache per chain instead of a global one.
> > > > > 
> > > > > A tricky bit is to decide if cache should be updated or not. Previously,
> > > > > the global rule cache was populated just once and then reused unless
> > > > > being flushed completely (via call to flush_rule_cache() with
> > > > > NULL-pointer table argument). Resemble this behaviour by introducing a
> > > > > boolean indicating cache status and fetch rules for all chains when
> > > > > updating the chain cache in nft_chain_list_get().
> > > > 
> > > > This is a rather large rewrite: Nice because it's going to speed up
> > > > things.
> > > > 
> > > > I think we should start doing ruleset generation tracking, IIRC this
> > > > is missing in the existing codebase. Such approach requires to keep
> > > > track of the ruleset generation number and invalidate caches.
> > > 
> > > Do you think parallel ruleset updates are a problem? None of the xtables
> > > tools are meant to stay running for long, so I didn't expect problems
> > > there until now. Sure, loading 10k rules with iptables-nft-restore takes
> > > a long time, but if kernel's ruleset changes in between, I don't see how
> > > we could recover from that (apart from restarting the whole restore).
> > 
> > Yes, restarting is the only way.
> > 
> > I was not thinking on the "load big ruleset" via -restore problem
> > (it's flushing the ruleset unless noflush option is used), instead
> > incremental updates could be? eg.  robot placing rules to be
> > added/deleted/replaced in a batch file that is passed to
> > iptables-restore to update an existing ruleset.
> 
> Ah, I see. Do you think it's feasible to support that in iptables-nft
> given that new users are probably better off if they use nftables
> directly?

Right indeed, nftables is better there, but you can also use
iptables-restore to do incremental ruleset updates with noflush option
in iptables.

> > But in such case, the rules in a given chain are still consistent,
> > since we check for interference there already via EINTR in netlink
> > dumps. We just may end up caching chains with rules in different
> > generations with this new approach. Probably this may end up to
> > misleading error reporting in a heavy concurrent rules update
> > environment. We could still easily detect this by enforcing checks for
> > consistent generation idea (otherwise restart) and use
> > NFNL_BATCH_GENID attribute for "batch is based on ruleset generation
> > X semantics" (not used by userspace code yet). That could be done in a
> > follow up patchset if needed. We can probably get rid of the global
> > userspace lock for iptables at some point.
> 
> Yes, indeed. My changes allow for fetching rules of a later generation
> than the chains. At this point though, we either have a command which
> doesn't need the rule cache or chain and rule fetching happens shortly
> after each other. And once these have been fetched, no further cache
> updates happen. We also have to be careful here, because at least my
> patch fixing for wrong rule position when using '-I <NUM>' with
> iptables-nft-restore heavily depends upon a chain which has been
> removed/flushed earlier is not fetched again. Hence why I introduced
> those booleans, basically avoiding cache updates despite a possible
> asymmetry.

Sounds fine.

> That global userspace lock you're talking about is within kernel space?
> I'm asking because the locking in userspace is not required with
> iptables-nft (as opposed to legacy).

Exactly. The userspace lock is backward compatibility stuff for scripts.

OK, please revamp and send v3, thanks for addressing my feedback.