On Mon, Mar 01, 2021 at 02:27:11PM -0400, Jason Gunthorpe wrote: > On Mon, Mar 01, 2021 at 06:20:06PM +0000, Pearson, Robert B wrote: > > > On Mon, Mar 01, 2021 at 10:54:21AM -0600, Bob Pearson wrote: > > > > >> I agree that ib_device_get/put is attempting to solve a problem that > > >> it not really very critical since ib_device is very unlikely to be > > >> shut down in the middle of a data transfer. The driver never worried about this for years. > > >> But now that it's been put on the table it should be done right. A > > >> data packet arriving is completely independent of the verbs API which > > >> *could* delete all the QPs and shut down the HCA while it was > > >> wondering around the universe or worse yet while the packet is being processed. > > > > > If driver shutdown can guarentee that all pointers involved in > > > multicast are revoked before shutdown can finish then you don't > > > need this refcounting. > > > > > It was only brought up because the API that returns the ib_device > > > from the netdev requires the refcounts as it is general purpose > > > > Unfortunately what you ask for is exactly what the refcounting code > > accomplishes and I don't see a simpler way to get there. This also > > applies to the non-multicast packets as well but all the debate has > > been about the code in rxe_rcv_mcast_pkt() because it is more > > blatant there or because I haven't been able to explain how it works > > well enough. > > Usually in the netstack land the shutdown of the device flushes all > this parallel work out so all the dataplane can happily ignore all > these details. > > I'm not so clear on all these details and how they apply to rxe of > course. You'd have to look at the full lifecycle of this skb and show > that the kfree_skb happens only before any unregistration finishes. > > Most likely there are other bugs if the unregistration can pass while > the skb is still out there. > > But, I'm not clear on how any of this works in rxe, this is just a > general remark on how things should ideally work. +1, I have same understanding and expect SKB to be flushed and new SKB are prevented from entering ib_device if it is going under destroy. Thanks > > Jason