On Mon, 22 Apr 2024 17:00:25 +0800 Heng Qi wrote: > 在 2024/4/19 上午8:48, Jakub Kicinski 写道: > > On Wed, 17 Apr 2024 23:55:44 +0800 Heng Qi wrote: > >> $ ethtool -c ethx > >> ... > >> rx-eqe-profile: > >> {.usec = 1, .pkts = 256, .comps = 0,}, > >> {.usec = 8, .pkts = 256, .comps = 0,}, > >> {.usec = 64, .pkts = 256, .comps = 0,}, > >> {.usec = 128, .pkts = 256, .comps = 0,}, > >> {.usec = 256, .pkts = 256, .comps = 0,} > >> rx-cqe-profile: n/a > >> tx-eqe-profile: n/a > >> tx-cqe-profile: n/a > > I don't think that exposing CQE vs EQE mode makes much sense here. > > We enable the right mode via: > > > > struct kernel_ethtool_coalesce { > > u8 use_cqe_mode_tx; > > u8 use_cqe_mode_rx; > > I think it is reasonable to use 4 types: > > dim lib itself is designed with 4 independent arrays, which are queried > by dim->mode and > dim->profile_ix. This way allows users to manually modify the cqe mode > of tx or rx, and the > dim interface can automatically match the profile of the corresponding > mode when obtaining > the results. I'm guessing that DIM has 4 profiles because it was easier to implement for profiles hardcoded by DIM itself(!) Even for driver-declared profiles the distinction is a hack. > Merely exposing rx_profile and tx_profile does not seem to make the > interface and code clearer. > > > the user needs to set the packets and usecs in an educated way > > (matching the mode). > > I think this is acceptable. In addition to the above reasons, users can > already set the cqe > mode of tx and rx in user mode, which means that they have been educated. > > > The kernel never changes the mode. > > Sorry, I don't seem to understand what this means. Kernel never changes the mode for its own reasons. Only user can change the mode. So we don't have to always have both CQE and EQE mode tables ready. If the kernel changed the mode for some reason we'd have to get both from the user, in case kernel wanted to switch. > >> /** > >> * register_netdevice() - register a network device > >> * @dev: device to register > >> @@ -10258,6 +10310,10 @@ int register_netdevice(struct net_device *dev) > >> if (ret) > >> return ret; > >> > >> + ret = dev_dim_profile_init(dev); > >> + if (ret) > >> + return ret; > > This is fine but the driver still has to manually do bunch of init: > > > > INIT_WORK(&vi->rq[i].dim.work, virtnet_rx_dim_work); > > vi->rq[i].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE; > > > > It'd be better to collect all this static info (flags, mode, work func) > > in one place / struct, attached to netdev or netdev_ops. Then the > > driver can call a helper like which only needs to take netdev and dim > > as arguments. > > If mode is put into netdev, then use_cqe_mode_rx and use_cqe_mode_tx > need to be moved into netdev too. > Then dim->mode is no longer required when dim obtain its results. > We need to transform the way all drivers with dim currently behave. Hopefully that won't be too hard. > But why should work be put into netdev? > The driver still requires a for loop to initialize dim work for a > driver-specific queues. It's easier to refactor and extend the API when there's an explicit call done to the core by the driver, rather than driver poking values into random fields of the core structure. > >> + * coalesce_put_profile - fill reply with a nla nest with four child nla nests. > >> + * @skb: socket buffer the message is stored in > >> + * @attr_type: nest attr type ETHTOOL_A_COALESCE_*X_*QE_PROFILE > >> + * @profile: data passed to userspace > >> + * @supported_params: modifiable parameters supported by the driver > >> + * > >> + * Put a dim profile nest attribute. Refer to ETHTOOL_A_MODERATIONS_MODERATION. > >> + * > >> + * Return: false to indicate successful placement or no placement, and > >> + * true to pass the -EMSGSIZE error to the wrapper. > > Why the bool? Doesn't most of the similar code return the error? > > Its wrapper function ethnl_default_doit only recognizes the EMSGSIZE code. Not sure what you mean. > >> @@ -227,7 +315,19 @@ const struct nla_policy ethnl_coalesce_set_policy[] = { > >> [ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES] = { .type = NLA_U32 }, > >> [ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES] = { .type = NLA_U32 }, > >> [ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS] = { .type = NLA_U32 }, > >> + [ETHTOOL_A_COALESCE_RX_EQE_PROFILE] = { .type = NLA_NESTED }, > >> + [ETHTOOL_A_COALESCE_RX_CQE_PROFILE] = { .type = NLA_NESTED }, > >> + [ETHTOOL_A_COALESCE_TX_EQE_PROFILE] = { .type = NLA_NESTED }, > >> + [ETHTOOL_A_COALESCE_TX_CQE_PROFILE] = { .type = NLA_NESTED }, > > NLA_POLICY_NESTED(coalesce_set_profile_policy) > > This doesn't work because the first level of sub-nesting of > ETHTOOL_A_COALESCE_RX_CQE_PROFILE is ETHTOOL_A_PROFILE_IRQ_MODERATION. So declare a policy for IRQ_MODERATION which has one entry -> nested profile policy?