On 12/5/2018 8:08 PM, Yanjun Zhu wrote: > > On 2018/12/5 23:14, Steve Wise wrote: >> The rxe device can be removed, which includes unregistering with the >> rdma core, from 3 places: a netdev notifier call, the sysfs handler >> used to delete a rxe device, and module unload. Currently these are >> not synchronized to ensure that the device is unregistered once and the >> memory only freed once. >> >> This commit adds proper serialization. Device removal and >> deregistration >> is consolidated into rxe_net_remove() which uses dev_list_lock to >> serialize removal from rxe_dev_list and ensures only 1 deregister. >> >> Functions net_to_rxe() and get_rxe_by_name() both now take a kref >> reference with dev_list_lock held to ensure that during a race between 2 >> removes, the rxe struct memory remains around until both racers release >> the reference. So now callers to these functions must call >> rxe_dev_put() >> when they are done using the rxe pointer. >> >> Signed-off-by: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> >> --- >> drivers/infiniband/sw/rxe/rxe.c | 8 -------- >> drivers/infiniband/sw/rxe/rxe.h | 1 - >> drivers/infiniband/sw/rxe/rxe_net.c | 26 ++++++++++++++++++++++---- >> drivers/infiniband/sw/rxe/rxe_net.h | 1 + >> drivers/infiniband/sw/rxe/rxe_sysfs.c | 4 ++-- >> 5 files changed, 25 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/infiniband/sw/rxe/rxe.c >> b/drivers/infiniband/sw/rxe/rxe.c >> index 383e65c7bbc0..971f0862cefe 100644 >> --- a/drivers/infiniband/sw/rxe/rxe.c >> +++ b/drivers/infiniband/sw/rxe/rxe.c >> @@ -331,14 +331,6 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu) >> return err; >> } >> -/* called by the ifc layer to remove a device */ >> -void rxe_remove(struct rxe_dev *rxe) >> -{ >> - rxe_unregister_device(rxe); >> - >> - rxe_dev_put(rxe); >> -} >> - >> static int __init rxe_module_init(void) >> { >> int err; >> diff --git a/drivers/infiniband/sw/rxe/rxe.h >> b/drivers/infiniband/sw/rxe/rxe.h >> index 8f79bd86d033..e0c0dce80bbf 100644 >> --- a/drivers/infiniband/sw/rxe/rxe.h >> +++ b/drivers/infiniband/sw/rxe/rxe.h >> @@ -96,7 +96,6 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe, >> void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); >> int rxe_add(struct rxe_dev *rxe, unsigned int mtu); >> -void rxe_remove(struct rxe_dev *rxe); >> void rxe_remove_all(void); >> void rxe_rcv(struct sk_buff *skb); >> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c >> b/drivers/infiniband/sw/rxe/rxe_net.c >> index b26a8141f3ed..6dc1a5b20e31 100644 >> --- a/drivers/infiniband/sw/rxe/rxe_net.c >> +++ b/drivers/infiniband/sw/rxe/rxe_net.c >> @@ -57,6 +57,7 @@ struct rxe_dev *net_to_rxe(struct net_device *ndev) >> list_for_each_entry(rxe, &rxe_dev_list, list) { >> if (rxe->ndev == ndev) { >> found = rxe; >> + kref_get(&rxe->ref_cnt); >> break; >> } >> } > > You add kref_get(&rxe->ref_cnt); into net_to_rxe. > > In this function, > > static int rxe_param_set_add(const char *val, const struct > kernel_param *kp) > > { > int len; > int err = 0; > char intf[32]; > struct net_device *ndev = NULL; > struct rxe_dev *rxe; > > len = sanitize_arg(val, intf, sizeof(intf)); > if (!len) { > pr_err("add: invalid interface name\n"); > err = -EINVAL; > goto err; > } > > ndev = dev_get_by_name(&init_net, intf); > if (!ndev) { > pr_err("interface %s not found\n", intf); > err = -EINVAL; > goto err; > } > > if (net_to_rxe(ndev)) { <---if this is true, > kref_get(&rxe->ref_cnt); make rxe->ref_cnt increase by 1. But this > function directly quit without handling this again. > pr_err("already configured on %s\n", intf); > > rxe_dev_put(rxe);<--this should be added here. Please pay attention > rxe->ref_cnt. > Yes, I missed this. Thanks.