On Sun, Dec 09, 2018 at 11:27:41AM +0200, Gal Pressman wrote: > On 06-Dec-18 22:07, Jason Gunthorpe wrote: > > On Thu, Dec 06, 2018 at 02:53:36PM -0500, Doug Ledford wrote: > > > >> Then we end up having to duplicate the three or so bugfix patches where > >> we moved roce sgid resolution out of the critical path because it can > >> sleep. We end up making the code problems we have worse. Better to > >> redesign this to remove that atomic assumption. Hence why I had them > >> drop the stuff they had. > > > > The resolution is a slightly different issue, that was to do with > > creating the rdma_ah_attr. Once the attr is created then the create_ah > > driver callback is non-sleepable.. > > > >> If we get to the point of having two link layers that can't comply > >> with the atomic requirement, then I think it's time to look at > >> changing the requirement. > > > > Sure, and the people who want it changed are responsible :) Don't just > > sweep the issue under the rung by ignoring the API contract 'because > > it works for me today' > > > > Jason > > > > We're not trying to sweep anything under the rug, we just can't comply with the > atomic requirement. > > We have two options: > 1. Comply with the atomic requirement, this means we will poll the device for > completion instead of waiting for an interrupt. > 2. Remove the atomic requirement. You could probably add a 'sleepable' flag to the create_ah function. EFA could do EOPNOTSUPP if called this way. This would be a good improvement overall as some drivers are already busy looping in create_ah and could avoid that in sleepable contexts. Jason