On Fri, Dec 14, 2018 at 07:28:19PM -0700, Parav Pandit wrote: > > -/** > > - * ib_unregister_device - Unregister an IB device > > - * @device:Device to unregister > > +/* Returns false if the device is already undergoing unregistration in > > +another > > + * thread, and the device pointer may be invalid upon return. > > * > > - * Unregister an IB device. All clients will receive a remove callback. > > + * If true is returned then the caller owns a kref associated with the > > + device > > + * (taken from the kref owned by the list). > > + * > > + * Upon return the device cannot be returned by any 'get' functions. > > */ > > -void ib_unregister_device(struct ib_device *device) > > +static bool ib_pre_unregister(struct ib_device *device) { > > + bool already_removed; > > + > > + mutex_lock(&device_mutex); > > + down_write(&lists_rwsem); > > + already_removed = list_empty(&device->core_list); > > + list_del_init(&device->core_list); > > + up_write(&lists_rwsem); > > + mutex_unlock(&device_mutex); > > No. you cannot unlock device_mutex here. Now that device is removed > from the list, another ib_register_device tries to create device > with same name and fails to creates sysfs entries. Hmm. Perhaps this could be a flag then, or perhaps we can have name assignment check the actual device list.. The same issue is with the list_move below, so this needs a more careful think. > > + /* Matches the recound_set in ib_alloc_device */ > Did you mean refcount set? yes > > + while ((device = list_first_entry_or_null( > > + &driver_devs, struct ib_device, core_list))) { > > + mutex_unlock(&device_mutex); > > + /* > > + * We are holding the unregister_rwsem, so it is impossible > > + * for another thread to be doing registration. > registration or unregistration? unregistration > > @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb) > > struct udphdr *udph; > > struct net_device *ndev = skb->dev; > > struct net_device *rdev = ndev; > > - struct rxe_dev *rxe = net_to_rxe(ndev); > > + struct rxe_dev *rxe = rxe_get_dev_from_net(ndev); > > struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); > > > > if (!rxe && is_vlan_dev(rdev)) { > > rdev = vlan_dev_real_dev(ndev); > > - rxe = net_to_rxe(rdev); > > + rxe = rxe_get_dev_from_net(rdev); > > Please, let's not lock-unlock a read semaphore on every incoming RoCE udp packet that too in NAPI's non-blocking context. This is an atomic context? That will need a bunch more code to solve properly :\ > > @@ -558,36 +528,18 @@ struct rxe_dev *rxe_net_add(struct net_device > > *ndev) > > rxe = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe)); > > if (!rxe) > > return NULL; > > - > > + rxe->ib_dev.driver_unregister = rxe_unregistered; > > rxe->ndev = ndev; > > > > err = rxe_add(rxe, ndev->mtu); > > if (err) { > > - ib_dealloc_device(&rxe->ib_dev); > Unbalanced ib_alloc_device and ib_dealloc_device doesn't sound like a good approach. > There should be a better way to do this. Er.. that kind of looks wrong actually, this looks like it needs the dealloc still. Once the register succeeds the dealloc responsibility falls to the core code. > I think driver unload and device unregistration related complexity > should be left in the rxe driver. no, too many drivers have this pattern and all are getting it wrong. The core code needs to handle more of it for the driver. Jason