On 8/2/2019 7:11 AM, Jakub Kicinski wrote: > On Thu, 1 Aug 2019 11:03:46 +0800, wenxu@xxxxxxxxx wrote: >> From: wenxu <wenxu@xxxxxxxxx> >> >> The new flow-indr-block can't get the tcf_block >> directly. It provide a callback list to find the flow_block immediately >> when the device register and contain a ingress block. >> >> Signed-off-by: wenxu <wenxu@xxxxxxxxx> > First of all thanks for splitting the series up into more patches, > it is easier to follow the logic now! > >> @@ -328,6 +348,7 @@ struct flow_indr_block_dev { >> >> INIT_LIST_HEAD(&indr_dev->cb_list); >> indr_dev->dev = dev; >> + flow_get_default_block(indr_dev); >> if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, >> flow_indr_setup_block_ht_params)) { >> kfree(indr_dev); > I wonder if it's still practical to keep the block information in the > indr_dev structure at all. The way this used to work was: > > > [hash table of devices] -------------- > | | netdev | > | | refcnt | > indir_dev[tun0]| ------ | cached block | ---- [ TC block ] > | | callbacks | . > | -------------- \__ [cb, cb_priv, cb_ident] > | [cb, cb_priv, cb_ident] > | -------------- > | | netdev | > | | refcnt | > indir_dev[tun1]| ------ | cached block | ---- [ TC block ] > | | callbacks |. > ----------------- -------------- \__ [cb, cb_priv, cb_ident] > [cb, cb_priv, cb_ident] > > > In the example above we have two tunnels tun0 and tun1, each one has a > indr_dev structure allocated, and for each one of them two drivers > registered for callbacks (hence the callbacks list has two entries). > > We used to cache the TC block in the indr_dev structure, but now that > there are multiple subsytems using the indr_dev we either have to have > a list of cached blocks (with entries for each subsystem) or just always > iterate over the subsystems :( > > After all the same device may have both a TC block and a NFT block. > > I think always iterating would be easier: > > The indr_dev struct would no longer have the block pointer, instead > when new driver registers for the callback instead of: > > if (indr_dev->ing_cmd_cb) > indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, > indr_block_cb->cb, indr_block_cb->cb_priv, > FLOW_BLOCK_BIND); > > We'd have something like the loop in flow_get_default_block(): > > for each (subsystem) > subsystem->handle_new_indir_cb(indr_dev, cb); > > And then per-subsystem logic would actually call the cb. Or: > > for each (subsystem) > block = get_default_block(indir_dev) > indr_dev->ing_cmd_cb(...) nft dev chian is also based on register_netdevice_notifier, So for unregister case, the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. is right? So maybe we can cache the block as a list of all the subsystem in indr_dev ? > I hope this makes sense. > > > Also please double check nft unload logic has an RCU synchronization > point, I'm not 100% confident rcu_barrier() implies synchronize_rcu(). > Perhaps someone more knowledgeable can chime in :) >