Re: [iptables PATCH v2 03/14] xtables: Implement per chain rule cache

Phil Sutter <phil@xxxxxx> · Mon, 17 Dec 2018 19:24:24 +0100

Hi Pablo,

On Mon, Dec 17, 2018 at 06:42:49PM +0100, Pablo Neira Ayuso wrote:
> On Thu, Dec 13, 2018 at 12:15:56PM +0100, Phil Sutter wrote:
> > Use recently introduced support for rules inside chains in libnftnl to
> > introduce a rule cache per chain instead of a global one.
> > 
> > A tricky bit is to decide if cache should be updated or not. Previously,
> > the global rule cache was populated just once and then reused unless
> > being flushed completely (via call to flush_rule_cache() with
> > NULL-pointer table argument). Resemble this behaviour by introducing a
> > boolean indicating cache status and fetch rules for all chains when
> > updating the chain cache in nft_chain_list_get().
> 
> This is a rather large rewrite: Nice because it's going to speed up
> things.
> 
> I think we should start doing ruleset generation tracking, IIRC this
> is missing in the existing codebase. Such approach requires to keep
> track of the ruleset generation number and invalidate caches.

Do you think parallel ruleset updates are a problem? None of the xtables
tools are meant to stay running for long, so I didn't expect problems
there until now. Sure, loading 10k rules with iptables-nft-restore takes
a long time, but if kernel's ruleset changes in between, I don't see how
we could recover from that (apart from restarting the whole restore).

> Let me do a bit of reviewing before we get this in, see below. Thanks!

Thanks for your review! Some comments below.

> 
> > Signed-off-by: Phil Sutter <phil@xxxxxx>
> > ---
> > Changes since v1:
> > - Update rule cache only if required.
> > ---
> >  iptables/nft.c             | 599 +++++++++++++++++--------------------
> >  iptables/nft.h             |   5 +-
> >  iptables/xtables-restore.c |   5 +-
> >  iptables/xtables-save.c    |   6 +-
> >  4 files changed, 282 insertions(+), 333 deletions(-)
> > 
> > diff --git a/iptables/nft.c b/iptables/nft.c
> > index e7a56778f8004..0869f2e222393 100644
> > --- a/iptables/nft.c
> > +++ b/iptables/nft.c
> [...]
> > @@ -815,23 +803,27 @@ static void flush_chain_cache(struct nft_handle *h, const char *tablename)
> >  		if (tablename && strcmp(h->tables[i].name, tablename))
> >  			continue;
> >  
> > -		if (h->table[i].chain_cache) {
> > -			if (tablename) {
> > -				nftnl_chain_list_foreach(h->table[i].chain_cache,
> > -							 __flush_chain_cache, NULL);
> > -				break;
> > -			} else {
> > -				nftnl_chain_list_free(h->table[i].chain_cache);
> > -				h->table[i].chain_cache = NULL;
> > -			}
> > +		if (!h->table[i].chain_cache) {
> > +			if (tablename)
> > +				return;
> > +			continue;
> >  		}
> > +		if (tablename) {
> > +			nftnl_chain_list_foreach(h->table[i].chain_cache,
> > +						 __flush_chain_cache, NULL);
> > +			return;
> > +		}
> > +
> > +		nftnl_chain_list_free(h->table[i].chain_cache);
> > +		h->table[i].chain_cache = NULL;
> >  	}
> > +	h->have_cache = false;
> 
> Maybe
> 
>         h->have_chain_cache
> 
> instead?

ACK. Simple 'have_cache' is too generic indeed.
> 
> > +	h->have_rule_cache = false;
> >  }
> >  
> >  void nft_fini(struct nft_handle *h)
> >  {
> >  	flush_chain_cache(h, NULL);
> > -	flush_rule_cache(h, NULL);
> >  	mnl_socket_close(h->nl);
> >  }
> >  
> > @@ -1191,12 +1183,15 @@ err:
> >  	return NULL;
> >  }
> >  
> > -static struct nftnl_rule_list *nft_rule_list_get(struct nft_handle *h);
> 
> Can you leave this function around?
> 
> Idea is that we call this function if we want a full cache, ie. chain + rules.
> 
> Then, we call nft_chain_list_get() to fetch chains only.
> 
> Otherwise, nft_rule_cache_get() ? Just trying this patch smaller here
> in this round for easier review.

Sure, no problem.

> > +static struct nftnl_chain *
> > +nft_chain_find(struct nft_handle *h, const char *table,
> > +	       const char *chain, bool need_rules);
> 
> We can probably use:
> 
>         nft_chain_find()
> 
> if we want a full cache, ie. give me chain and retrieve rules
> (populate cache if needed).
> 
> Otherwise, use:
> 
>         __nft_chain_find()
> 
> if we want only to fetch the chain cache, so we don't need the "bool
> need_rules". Only two spots use nft_chain_find(..., false) in this
> patch that I can see.

Will do.

> >  int
> >  nft_rule_append(struct nft_handle *h, const char *chain, const char *table,
> >  		void *data, uint64_t handle, bool verbose)
> >  {
> > +	struct nftnl_chain *c;
> >  	struct nftnl_rule *r;
> >  	int type;
> >  
> > @@ -1224,10 +1219,9 @@ nft_rule_append(struct nft_handle *h, const char *chain, const char *table,
> >  	if (verbose)
> >  		h->ops->print_rule(r, 0, FMT_PRINT_RULE);
> >  
> > -	if (!nft_rule_list_get(h))
> > -		return 0;
> > -
> > -	nftnl_rule_list_add_tail(r, h->rule_cache);
> > +	c = nft_chain_find(h, table, chain, true);
> 
> I guess we always find a match from nft_chain_find().
> 
> > +	if (c)
> 
> So this branch is probably not needed, otherwise do no ignore this
> error?
> 
> > +		nftnl_chain_rule_add(r, c);

Ah, yes. In a later patch of the same series I turn this into:

|         c = nft_chain_find(h, table, chain, true);
| -       if (c)
| -               nftnl_chain_rule_add(r, c);
| +       assert(c);
| +       nftnl_chain_rule_add_tail(r, c);

Because if the rule is not added to the chain cache, it won't end in the
batch. Also the change from rule_add_tail to rule_add is a mistake here.

[...]
> > @@ -1297,33 +1285,60 @@ err:
> >  	return MNL_CB_OK;
> >  }
> >  
> > +static int nft_rule_list_update(struct nftnl_chain *c, void *data);
> > +
> >  struct nftnl_chain_list *nft_chain_list_get(struct nft_handle *h,
> > -					    const char *table)
> > +					    const char *table, bool need_rules)
> >  {
> >  	char buf[16536];
> >  	struct nlmsghdr *nlh;
> >  	const struct builtin_table *t;
> >  	int ret;
> > +	int i;
> >  
> >  	t = nft_table_builtin_find(h, table);
> >  	if (!t)
> >  		return NULL;
> 
> This chunk below...
> 
> > -	if (h->table[t->type].chain_cache)
> > +	if (h->have_cache && (!need_rules || h->have_rule_cache))
> >  		return h->table[t->type].chain_cache;
> > +
> > +	if (h->have_cache)
> > +		goto fetch_rules;
> 
> probably code above can be rework to this below?
> 
>         if (h->have_cache) {
>                 if (!need_rules || h->have_rule_cache)
>          		return h->table[t->type].chain_cache;
> 
> 		goto fetch_rules;
>         }
> 
> We can probably remove "goto fetch_rules;" and make this a function.
> 
>         if (h->have_cache) {
>                 if (!need_rules || h->have_rule_cache)
>          		return h->table[t->type].chain_cache;
> 
>                 fetch_rules_only();
>         } else {
>                 fetch_full();
>         }
> 
> so we use a bit less of goto for non-error path.

Yes, good points. I will do that.

[...]
> > -static struct nftnl_rule_list *nft_rule_list_get(struct nft_handle *h)
> > +static int nft_rule_list_update(struct nftnl_chain *c, void *data)
> 
> This nft_rule_list_update() looks like our former nft_rule_list_get().

Yes, I turned nft_rule_list_get() into a callback for use in
nft_chain_list_get(). The rename is more or less a side-effect of that.
I'll see if I can reuse code between nft_rule_list_get() and the
callback (if needed at all).

> >  {
> > +	struct nft_handle *h = data;
> >  	char buf[16536];
> >  	struct nlmsghdr *nlh;
> > -	struct nftnl_rule_list *list;
> > +	struct nftnl_rule *rule;
> >  	int ret;
> >  
> > -	if (h->rule_cache)
> > -		return h->rule_cache;
> > +	rule = nftnl_rule_alloc();
> > +	if (!rule)
> > +		return false;
> >  
> > -retry:
> > -	list = nftnl_rule_list_alloc();
> > -	if (list == NULL)
> > -		return 0;
> > +	nftnl_rule_set(rule, NFTNL_RULE_TABLE,
> > +		       nftnl_chain_get_str(c, NFTNL_CHAIN_TABLE));
> 
> Use nftnl_rule_set_str() instead. I know this code it probably using
> this old interface (long time ago there was not nftnl_rule_set_str()).

Sure, will do.

[...]
> > -int nft_rule_save(struct nft_handle *h, const char *table, unsigned int format)
> > +static int nft_chain_save_rules(struct nft_handle *h,
> > +				struct nftnl_chain *c, unsigned int format)
> >  {
> > -	struct nftnl_rule_list *list;
> > -	struct nftnl_rule_list_iter *iter;
> > +	struct nftnl_rule_iter *iter;
> >  	struct nftnl_rule *r;
> >  
> > -	list = nft_rule_list_get(h);
> > -	if (list == NULL)
> > -		return 0;
> > -
> > -	iter = nftnl_rule_list_iter_create(list);
> > +	iter = nftnl_rule_iter_create(c);
> >  	if (iter == NULL)
> > -		return 0;
> > +		return 1;
> 
> Probably we can start using -1 for errors and 0 for success
> incrementally in this code? So we don't stick to the old/original
> iptables function error codes anymore from nft.c.

Fine with me! Keeping track which function has to return what is tedious
sometimes.

[...]
> > @@ -1553,6 +1584,7 @@ int nft_rule_flush(struct nft_handle *h, const char *chain, const char *table,
> >  			fprintf(stdout, "Flushing chain `%s'\n", chain_name);
> >  
> >  		__nft_rule_flush(h, table, chain_name);
> > +		flush_rule_cache(c);
> 
> Place this flush_rule_cache() call in __nft_rule_flush() ?

The function is used from __nft_chain_user_flush() as well where no
cache flush happens.

[...]
> This chunk below could be well in a separated patch for easier review.
> 
> > -static int nft_is_expr_compatible(const struct nftnl_expr *expr)
> > +static bool nft_is_expr_compatible(const struct nftnl_expr *expr)

Yes, you're right. I had to adjust the is_*_compatible functions so
decided to make them a bit more consistent while doing so. I'll move the
return type change into a prepended patch.

[...]
> > --- a/iptables/xtables-restore.c
> > +++ b/iptables/xtables-restore.c
[...]
> > @@ -155,6 +155,9 @@ void xtables_restore_parse(struct nft_handle *h,
> >  					table);
> >  				if (cb->table_flush)
> >  					cb->table_flush(h, table);
> > +				/* fake rule cache presence so that we don't populate
> > +				 * the cache later with rules just flushed here */
> 
>                                 /* incrementally adding netdev comment styles
>                                  * would be good :-).
>                                  */

So comment ending on new line? I always forget what's the right comment
style in each project. :)

Thanks, Phil