在 2019/8/2 18:45, wenxu 写道: > On 8/2/2019 7:11 AM, Jakub Kicinski wrote: >> On Thu, 1 Aug 2019 11:03:46 +0800, wenxu@xxxxxxxxx wrote: >>> From: wenxu <wenxu@xxxxxxxxx> >>> >>> The new flow-indr-block can't get the tcf_block >>> directly. It provide a callback list to find the flow_block immediately >>> when the device register and contain a ingress block. >>> >>> Signed-off-by: wenxu <wenxu@xxxxxxxxx> >> First of all thanks for splitting the series up into more patches, >> it is easier to follow the logic now! >> >>> @@ -328,6 +348,7 @@ struct flow_indr_block_dev { >>> >>> INIT_LIST_HEAD(&indr_dev->cb_list); >>> indr_dev->dev = dev; >>> + flow_get_default_block(indr_dev); >>> if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node, >>> flow_indr_setup_block_ht_params)) { >>> kfree(indr_dev); >> I wonder if it's still practical to keep the block information in the >> indr_dev structure at all. The way this used to work was: >> >> >> [hash table of devices] -------------- >> | | netdev | >> | | refcnt | >> indir_dev[tun0]| ------ | cached block | ---- [ TC block ] >> | | callbacks | . >> | -------------- \__ [cb, cb_priv, cb_ident] >> | [cb, cb_priv, cb_ident] >> | -------------- >> | | netdev | >> | | refcnt | >> indir_dev[tun1]| ------ | cached block | ---- [ TC block ] >> | | callbacks |. >> ----------------- -------------- \__ [cb, cb_priv, cb_ident] >> [cb, cb_priv, cb_ident] >> >> >> In the example above we have two tunnels tun0 and tun1, each one has a >> indr_dev structure allocated, and for each one of them two drivers >> registered for callbacks (hence the callbacks list has two entries). >> >> We used to cache the TC block in the indr_dev structure, but now that >> there are multiple subsytems using the indr_dev we either have to have >> a list of cached blocks (with entries for each subsystem) or just always >> iterate over the subsystems :( >> >> After all the same device may have both a TC block and a NFT block. >> >> I think always iterating would be easier: >> >> The indr_dev struct would no longer have the block pointer, instead >> when new driver registers for the callback instead of: >> >> if (indr_dev->ing_cmd_cb) >> indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, >> indr_block_cb->cb, indr_block_cb->cb_priv, >> FLOW_BLOCK_BIND); >> >> We'd have something like the loop in flow_get_default_block(): >> >> for each (subsystem) >> subsystem->handle_new_indir_cb(indr_dev, cb); >> >> And then per-subsystem logic would actually call the cb. Or: >> >> for each (subsystem) >> block = get_default_block(indir_dev) >> indr_dev->ing_cmd_cb(...) > nft dev chian is also based on register_netdevice_notifier, So for unregister case, > > the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. is right? > > So maybe we can cache the block as a list of all the subsystem in indr_dev ? when the device is unregister the nft netdev chain related to this device will also be delete through netdevice_notifier . So for unregister case,the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. cache for the block is not work because the chain already be delete and free. Maybe it improve the prio of rep_netdev_event can help this? > >