Re: [iptables PATCH 3/3] nft: Fix for concurrent noflush restore calls

Phil Sutter <phil@xxxxxx> · Tue, 13 Oct 2020 12:08:03 +0200

On Mon, Oct 12, 2020 at 02:54:50PM +0200, Pablo Neira Ayuso wrote:
[...]
> > -	if (h->family == NFPROTO_BRIDGE)
> > -		nftnl_chain_set_u32(c, NFTNL_CHAIN_POLICY, NF_ACCEPT);
> > +		list = nft_chain_list_get(h, table, chain);
> > +		if (list)
> > +			nftnl_chain_list_add(c, list);
> > +	} else {
> > +		/* If the chain should vanish meanwhile, kernel genid changes
> > +		 * and the transaction is refreshed enabling the chain add
> > +		 * object. With the handle still set, kernel interprets it as a
> > +		 * chain replace job and errors since it is not found anymore.
> > +		 */
> > +		nftnl_chain_unset(c, NFTNL_CHAIN_HANDLE);
> > +	}
> >  
> > -	if (!created)
> > -		return 1;
> > +	__nft_rule_flush(h, table, chain, false, created);
> 
> I like this trick.
> 
> If I understood correct, you always place an "add chain" command in
> first place before the flush, whether the chain exists or not, so the
> flush always succeeds.

Yes, this was already done for "add table". So I'm "merely" extending
Florian's refresh transaction logic with regards to 'skip' flag.

> > -	if (!batch_chain_add(h, NFT_COMPAT_CHAIN_USER_ADD, c))
> > +	obj = batch_chain_add(h, NFT_COMPAT_CHAIN_USER_ADD, c);
> > +	if (!obj)
> >  		return 0;
> >  
> > -	list = nft_chain_list_get(h, table, chain);
> > -	if (list)
> > -		nftnl_chain_list_add(c, list);
> > +	obj->skip = !created;
> >  
> >  	/* the core expects 1 for success and 0 for error */
> >  	return 1;
> > @@ -2649,11 +2649,6 @@ static void nft_refresh_transaction(struct nft_handle *h)
> >  	h->error.lineno = 0;
> >  
> >  	list_for_each_entry_safe(n, tmp, &h->obj_list, head) {
> > -		if (n->implicit) {
> > -			batch_obj_del(h, n);
> > -			continue;
> > -		}
> > -
> >  		switch (n->type) {
> >  		case NFT_COMPAT_TABLE_FLUSH:
> >  			tablename = nftnl_table_get_str(n->table, NFTNL_TABLE_NAME);
> > @@ -2679,14 +2674,22 @@ static void nft_refresh_transaction(struct nft_handle *h)
> >  
> >  			c = nft_chain_find(h, tablename, chainname);
> >  			if (c) {
> > -				/* -restore -n flushes existing rules from redefined user-chain */
> > -				__nft_rule_flush(h, tablename,
> > -						 chainname, false, true);
> >  				n->skip = 1;
> >  			} else if (!c) {
> >  				n->skip = 0;
> >  			}
> >  			break;
> > +		case NFT_COMPAT_RULE_FLUSH:
> > +			tablename = nftnl_rule_get_str(n->rule, NFTNL_RULE_TABLE);
> > +			if (!tablename)
> > +				continue;
> > +
> > +			chainname = nftnl_rule_get_str(n->rule, NFTNL_RULE_CHAIN);
> > +			if (!chainname)
> > +				continue;
> > +
> > +			n->skip = !nft_chain_find(h, tablename, chainname);
> 
> So n->skip equals true if the chain does not exists in the cache, so
> this flush does not fail (after this chain is gone because someone
> else have remove it).

Yes, the tricky bit is being able to adjust the batch object list no
matter how kernel ruleset changes. The only way is to assume the most
overhead and temporary disable individual jobs if not needed. This way
we can enable/disable while refreshing the transaction.

> Patch LGTM, thanks Phil.
> 
> What I don't clearly see yet is what scenario is triggering the bug in
> the existing code, if you don't mind to explain.

See the test case attached to the patch: An other iptables-restore
process may add references (i.e., jumps) to a chain the own
iptables-restore process wants to delete. This should not be a problem
because these references are added to a chain that is being flushed by
the own process as well. But if that chain doesn't exist while the own
process fetches kernel's ruleset, this flush job is not created.

Cheers, Phil