Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately

Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx> · Thu, 1 Aug 2019 16:11:29 -0700

On Thu,  1 Aug 2019 11:03:46 +0800, wenxu@xxxxxxxxx wrote:
> From: wenxu <wenxu@xxxxxxxxx>
> 
> The new flow-indr-block can't get the tcf_block
> directly. It provide a callback list to find the flow_block immediately
> when the device register and contain a ingress block.
> 
> Signed-off-by: wenxu <wenxu@xxxxxxxxx>

First of all thanks for splitting the series up into more patches, 
it is easier to follow the logic now!

> @@ -328,6 +348,7 @@ struct flow_indr_block_dev {
>  
>  	INIT_LIST_HEAD(&indr_dev->cb_list);
>  	indr_dev->dev = dev;
> +	flow_get_default_block(indr_dev);
>  	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
>  				   flow_indr_setup_block_ht_params)) {
>  		kfree(indr_dev);

I wonder if it's still practical to keep the block information in the
indr_dev structure at all. The way this used to work was:

[hash table of devices]     --------------
                 |         |    netdev    |
                 |         |    refcnt    |
  indir_dev[tun0]|  ------ | cached block | ---- [ TC block ]
                 |         |   callbacks  | .
                 |          --------------   \__ [cb, cb_priv, cb_ident]
                 |                               [cb, cb_priv, cb_ident]
                 |          --------------
                 |         |    netdev    |
                 |         |    refcnt    |
  indir_dev[tun1]|  ------ | cached block | ---- [ TC block ]
                 |         |   callbacks  |.
-----------------           --------------   \__ [cb, cb_priv, cb_ident]
                                                 [cb, cb_priv, cb_ident]

In the example above we have two tunnels tun0 and tun1, each one has a
indr_dev structure allocated, and for each one of them two drivers
registered for callbacks (hence the callbacks list has two entries).

We used to cache the TC block in the indr_dev structure, but now that
there are multiple subsytems using the indr_dev we either have to have
a list of cached blocks (with entries for each subsystem) or just always
iterate over the subsystems :(

After all the same device may have both a TC block and a NFT block.

I think always iterating would be easier:

The indr_dev struct would no longer have the block pointer, instead
when new driver registers for the callback instead of:

	if (indr_dev->ing_cmd_cb)
		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
				     indr_block_cb->cb, indr_block_cb->cb_priv,
				     FLOW_BLOCK_BIND);

We'd have something like the loop in flow_get_default_block():

	for each (subsystem)
		subsystem->handle_new_indir_cb(indr_dev, cb);

And then per-subsystem logic would actually call the cb. Or:

	for each (subsystem)
		block = get_default_block(indir_dev)
		indr_dev->ing_cmd_cb(...)

I hope this makes sense.

Also please double check nft unload logic has an RCU synchronization
point, I'm not 100% confident rcu_barrier() implies synchronize_rcu().
Perhaps someone more knowledgeable can chime in :)