On Mon, Mar 01, 2021 at 06:20:06PM +0000, Pearson, Robert B wrote: > > On Mon, Mar 01, 2021 at 10:54:21AM -0600, Bob Pearson wrote: > > >> I agree that ib_device_get/put is attempting to solve a problem that > >> it not really very critical since ib_device is very unlikely to be > >> shut down in the middle of a data transfer. The driver never worried about this for years. > >> But now that it's been put on the table it should be done right. A > >> data packet arriving is completely independent of the verbs API which > >> *could* delete all the QPs and shut down the HCA while it was > >> wondering around the universe or worse yet while the packet is being processed. > > > If driver shutdown can guarentee that all pointers involved in > > multicast are revoked before shutdown can finish then you don't > > need this refcounting. > > > It was only brought up because the API that returns the ib_device > > from the netdev requires the refcounts as it is general purpose > > Unfortunately what you ask for is exactly what the refcounting code > accomplishes and I don't see a simpler way to get there. This also > applies to the non-multicast packets as well but all the debate has > been about the code in rxe_rcv_mcast_pkt() because it is more > blatant there or because I haven't been able to explain how it works > well enough. Usually in the netstack land the shutdown of the device flushes all this parallel work out so all the dataplane can happily ignore all these details. I'm not so clear on all these details and how they apply to rxe of course. You'd have to look at the full lifecycle of this skb and show that the kfree_skb happens only before any unregistration finishes. Most likely there are other bugs if the unregistration can pass while the skb is still out there. But, I'm not clear on how any of this works in rxe, this is just a general remark on how things should ideally work. Jason