Re: [PATCH nf] netfilter: nf_tables: integrate pipapo into commit protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delay.

On Mon, 24 Apr 2023 15:33:20 +0200
Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:

> The pipapo set backend follows copy-on-update approach, maintaining one
> clone of the existing datastructure that is being updated. The clone
> and current datastructures are swapped via rcu from the commit step.
> 
> The existing integration with the commit protocol is flawed because
> there is no operation to clean up the clone if the transaction is
> aborted. Moreover, the datastructure swap happens on set element
> activation.
> 
> This patch adds two new operations for sets: commit and abort, these new
> operations are invoked from the commit and abort steps, after the
> transactions have been digested, and it updates the pipapo set backend
> to use it.
> 
> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> ---
> Hi Stefano,
> 
> I don't see any path to reset ->dirty in case that the transaction is aborted.
> This might lead to this sequence:
> 
> 1) add set element: ->insert adds new entry in the clone (preparation phase)
> 2) any command that fails (preparation phase)
> 3) abort phase (->dirty bit is left set on and priv->clone contains an partial update)

That's because nft_pipapo_activate() can't fail on elements that are
successfully inserted by nft_pipapo_insert() -- that is, the
pipapo_get() call from nft_pipapo_activate() should always succeed.

Now, let's say that one call to nft_pipapo_insert() fails: at that
point I would expect all the pending insertions to be deleted before
the abort phase completes. So yes, we have a dirty bit unnecessarily
set, but the clone shouldn't actually contain a partial update.

Regardless of that, having an actual abort operation is cleaner than
this and definitely welcome.

> I am trying to figure out if this could be related to:
> https://bugzilla.netfilter.org/show_bug.cgi?id=1583

For sure this simplifies the matter, but I'm not sure exactly in which
way it could be related (I'm not saying that there's no way, though).

>  include/net/netfilter/nf_tables.h |  3 +-
>  net/netfilter/nf_tables_api.c     | 36 +++++++++++++++++++++
>  net/netfilter/nft_set_pipapo.c    | 53 ++++++++++++++++++++++---------
>  3 files changed, 76 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> index 552e19ba4f43..211921dd0ac6 100644
> --- a/include/net/netfilter/nf_tables.h
> +++ b/include/net/netfilter/nf_tables.h
> @@ -462,7 +462,8 @@ struct nft_set_ops {
>  					       const struct nft_set *set,
>  					       const struct nft_set_elem *elem,
>  					       unsigned int flags);
> -
> +	void				(*commit)(const struct nft_set *set);
> +	void				(*abort)(const struct nft_set *set);
>  	u64				(*privsize)(const struct nlattr * const nla[],
>  						    const struct nft_set_desc *desc);
>  	bool				(*estimate)(const struct nft_set_desc *desc,
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 0e072b2365df..ef8d9f6a7e9c 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -9259,6 +9259,22 @@ static void nf_tables_commit_audit_log(struct list_head *adl, u32 generation)
>  	}
>  }
>  
> +static void nft_set_commit_update(struct net *net)
> +{
> +	struct nftables_pernet *nft_net = nft_pernet(net);
> +	struct nft_table *table;
> +	struct nft_set *set;
> +
> +	list_for_each_entry(table, &nft_net->tables, list) {
> +		list_for_each_entry(set, &table->sets, list) {
> +			if (!set->ops->commit)
> +				continue;
> +
> +			set->ops->commit(set);
> +		}
> +	}
> +}
> +
>  static int nf_tables_commit(struct net *net, struct sk_buff *skb)
>  {
>  	struct nftables_pernet *nft_net = nft_pernet(net);
> @@ -9513,6 +9529,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
>  		}
>  	}
>  
> +	nft_set_commit_update(net);
> +
>  	nft_commit_notify(net, NETLINK_CB(skb).portid);
>  	nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
>  	nf_tables_commit_audit_log(&adl, nft_net->base_seq);
> @@ -9572,6 +9590,22 @@ static void nf_tables_abort_release(struct nft_trans *trans)
>  	kfree(trans);
>  }
>  
> +static void nft_set_abort_update(struct net *net)
> +{
> +	struct nftables_pernet *nft_net = nft_pernet(net);
> +	struct nft_table *table;
> +	struct nft_set *set;
> +
> +	list_for_each_entry(table, &nft_net->tables, list) {
> +		list_for_each_entry(set, &table->sets, list) {
> +			if (!set->ops->abort)
> +				continue;
> +
> +			set->ops->abort(set);
> +		}
> +	}
> +}
> +
>  static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
>  {
>  	struct nftables_pernet *nft_net = nft_pernet(net);
> @@ -9737,6 +9771,8 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
>  		}
>  	}
>  
> +	nft_set_abort_update(net);
> +
>  	synchronize_rcu();
>  
>  	list_for_each_entry_safe_reverse(trans, next,

I didn't imagine it would be so simple -- I would have gone with this
right away. On the other hand, I had no idea where to put those calls :)

Now that we have commit and abort operations, I'm wondering: what if we
make activate and deactivate callbacks optional? They're not really
needed by nft_set_pipapo: all the elements could be directly "active"
on insertion, and remain active until we have a commit operation after
they are deleted.

I'm not sure if there are drawbacks, and I think that your patch works
anyway as expected.

> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index 06d46d182634..06b8b26e666a 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -1600,17 +1600,10 @@ static void pipapo_free_fields(struct nft_pipapo_match *m)
>  	}
>  }
>  
> -/**
> - * pipapo_reclaim_match - RCU callback to free fields from old matching data
> - * @rcu:	RCU head
> - */
> -static void pipapo_reclaim_match(struct rcu_head *rcu)

/**
 * pipapo_free_match() - Free fields from unused matching data
 * @m:		Matching data
 */

> +static void pipapo_free_match(struct nft_pipapo_match *m)
>  {
> -	struct nft_pipapo_match *m;
>  	int i;
>  
> -	m = container_of(rcu, struct nft_pipapo_match, rcu);
> -
>  	for_each_possible_cpu(i)
>  		kfree(*per_cpu_ptr(m->scratch, i));
>  
> @@ -1625,7 +1618,19 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
>  }
>  
>  /**
> - * pipapo_commit() - Replace lookup data with current working copy
> + * pipapo_reclaim_match - RCU callback to free fields from old matching data

 * pipapo_reclaim_match() - RCU callback to free unused matching data

...it's not necessarily "old" anymore.

> + * @rcu:	RCU head
> + */
> +static void pipapo_reclaim_match(struct rcu_head *rcu)
> +{
> +	struct nft_pipapo_match *m;
> +
> +	m = container_of(rcu, struct nft_pipapo_match, rcu);
> +	pipapo_free_match(m);
> +}
> +
> +/**
> + * nft_pipapo_commit() - Replace lookup data with current working copy
>   * @set:	nftables API set representation
>   *
>   * While at it, check if we should perform garbage collection on the working
> @@ -1635,7 +1640,7 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
>   * We also need to create a new working copy for subsequent insertions and
>   * deletions.
>   */
> -static void pipapo_commit(const struct nft_set *set)
> +static void nft_pipapo_commit(const struct nft_set *set)
>  {
>  	struct nft_pipapo *priv = nft_set_priv(set);
>  	struct nft_pipapo_match *new_clone, *old;
> @@ -1660,6 +1665,24 @@ static void pipapo_commit(const struct nft_set *set)
>  	priv->clone = new_clone;
>  }
>  
> +static void nft_pipapo_abort(const struct nft_set *set)

/**
 * nft_pipapo_abort() - Drop uncommitted matching data if any, reset dirty flag
 * @set:	nftables API set representation
 */

> +{
> +	struct nft_pipapo *priv = nft_set_priv(set);
> +	struct nft_pipapo_match *new_clone;
> +
> +	if (!priv->dirty)
> +		return;
> +
> +	new_clone = pipapo_clone(priv->match);
> +	if (IS_ERR(new_clone))
> +		return;
> +
> +	priv->dirty = false;
> +
> +	pipapo_free_match(priv->clone);
> +	priv->clone = new_clone;

This is essentially a non-RCU version of (the new) nft_pipapo_commit(),
minus the garbage collection stuff, but I think a tiny bit of
duplication here as you already implemented it is clearer than trying
to factor this out into some helper.

> +}
> +
>  /**
>   * nft_pipapo_activate() - Mark element reference as active given key, commit
>   * @net:	Network namespace
> @@ -1667,8 +1690,7 @@ static void pipapo_commit(const struct nft_set *set)
>   * @elem:	nftables API element representation containing key data
>   *
>   * On insertion, elements are added to a copy of the matching data currently
> - * in use for lookups, and not directly inserted into current lookup data, so
> - * we'll take care of that by calling pipapo_commit() here. Both
> + * in use for lookups, and not directly inserted into current lookup data. Both
>   * nft_pipapo_insert() and nft_pipapo_activate() are called once for each
>   * element, hence we can't purpose either one as a real commit operation.
>   */
> @@ -1684,8 +1706,6 @@ static void nft_pipapo_activate(const struct net *net,
>  
>  	nft_set_elem_change_active(net, set, &e->ext);
>  	nft_set_elem_clear_busy(&e->ext);
> -
> -	pipapo_commit(set);
>  }
>  
>  /**
> @@ -1931,7 +1951,6 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
>  		if (i == m->field_count) {
>  			priv->dirty = true;
>  			pipapo_drop(m, rulemap);
> -			pipapo_commit(set);
>  			return;
>  		}
>  
> @@ -2230,6 +2249,8 @@ const struct nft_set_type nft_set_pipapo_type = {
>  		.init		= nft_pipapo_init,
>  		.destroy	= nft_pipapo_destroy,
>  		.gc_init	= nft_pipapo_gc_init,
> +		.commit		= nft_pipapo_commit,
> +		.abort		= nft_pipapo_abort,
>  		.elemsize	= offsetof(struct nft_pipapo_elem, ext),
>  	},
>  };
> @@ -2252,6 +2273,8 @@ const struct nft_set_type nft_set_pipapo_avx2_type = {
>  		.init		= nft_pipapo_init,
>  		.destroy	= nft_pipapo_destroy,
>  		.gc_init	= nft_pipapo_gc_init,
> +		.commit		= nft_pipapo_commit,
> +		.abort		= nft_pipapo_abort,
>  		.elemsize	= offsetof(struct nft_pipapo_elem, ext),
>  	},
>  };

Everything else looks good to me, thanks!

-- 
Stefano




[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux