On Thu, 2018-12-06 at 12:11 -0700, Jason Gunthorpe wrote: > On Thu, Dec 06, 2018 at 02:01:54PM -0500, Doug Ledford wrote: > > On Thu, 2018-12-06 at 10:49 -0700, Jason Gunthorpe wrote: > > > On Thu, Dec 06, 2018 at 10:47:33AM +0200, Gal Pressman wrote: > > > > > > > > > +static void efa_destroy_ah_id(struct efa_dev *dev, u8 *id) { > > > > > > + u16 device_ah; > > > > > > + u32 ref_count; > > > > > > + int err; > > > > > > + > > > > > > + mutex_lock(&dev->ah_list_lock); > > > > > > > > > > Create and destroy ah for kernel consumers cannot sleep. > > > > > Though I understand that currently you don't have kernel consumers. > > > > > This should be changed to spin lock along with other allocation to GFP_ATOMIC. > > > > > > > > Create/destroy AH must be communicated with our device (in the form of an admin > > > > command), this requires us to sleep regardless of the mutex/memory allocation > > > > (we shouldn't busy wait while waiting for the device to complete the command). > > > > None of the currently supported flows are done in an atomic context, we will > > > > have to figure that out when we add support for those. > > > > > > You don't get to make up your own rules. The function pointer this is > > > being assigned to required atomic support. > > > > The core code itself treats this as non-atomic for user space calls, and > > only atomic for kernel calls. So this isn't making up their own rules, > > this is following *exactly* the same rules as the current core kernel > > code, they simply don't provide a kernel verbs API implementation, only > > a user space implementation. > > And what if tomorrow we decide, for some reason, to chuck a spinlock > or RCU critical region around the create_ah in uverbs? The function > pointer is defined to be atomic, and is atomic in *every normal > driver* so this should absolutely be OK. Then we end up having to duplicate the three or so bugfix patches where we moved roce sgid resolution out of the critical path because it can sleep. We end up making the code problems we have worse. Better to redesign this to remove that atomic assumption. Hence why I had them drop the stuff they had. If we get to the point of having two link layers that can't comply with the atomic requirement, then I think it's time to look at changing the requirement. > > Besides, they had something ugly in there and I told them to drop it, so > > you can blame me for that. > > Most likely a big wack of this driver should be using driver specific > uapi calls, not abusing the common API with proprietary stuff. Then > they don't have these API mismatch problems. > > But, that should come out of a careful think on how/if we want to > evolve the RDMA to support MPI accelerators with no kernel API. > > Jason -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part