On Fri, 2022-10-21 at 12:04 -0300, Jason Gunthorpe wrote: > On Fri, Oct 21, 2022 at 05:01:32PM +0200, Niklas Schnelle wrote: > > On Fri, 2022-10-21 at 10:36 -0300, Jason Gunthorpe wrote: > > > On Fri, Oct 21, 2022 at 02:08:02PM +0200, Niklas Schnelle wrote: > > > > On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > > > > > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > > > > > > > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > > > > > still that in this situation the IOTLB flush is architected to return > > > > > > an error that we can ignore. Not the most elegant I admit but at least > > > > > > it's simple. Alternatively I guess we could use call_rcu() to do the > > > > > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > > > > > subsequent zpci_register_ioat() only happens after that without adding > > > > > > too much more logic. > > > > > > > > > > This won't work either as the domain could have been freed before the > > > > > call_rcu() happens, the domain needs to be detached synchronously > > > > > > > > > > Jason > > > > > > > > Yeah right, that is basically the same issue I was thinking of for a > > > > subsequent zpci_register_ioat(). What about the obvious one. Just call > > > > synchronize_rcu() before zpci_unregister_ioat()? > > > > > > Ah, it can be done, but be prepared to wait >> 1s for synchronize_rcu > > > to complete in some cases. > > > > > > What you have seems like it could be OK, just deal with the ugly racy > > > failure > > > > > > Jason > > > > I'd tend to go with synchronize_rcu(). It won't leave us with spurious > > error logs for the failed IOTLB flushes and as you said one expects > > detach to be synchronous. I don't think waiting in it will be a > > problem. But this is definitely something you're more of an expert on > > so I'll trust your judgement. Looking at other callers of > > synchronize_rcu() quite a few of them look to be in similar > > detach/release kind of situations though not sure how frequent and > > performance critical IOMMU domain detaching is in comparison. > > I would not do it on domain detaching, that is something triggered by > userspace through VFIO and it could theoritically happen alot, eg in > vIOMMU scenarios. > > Jason Thanks for the explanation, still would like to grok this a bit more if you don't mind. If I do read things correctly synchronize_rcu() should run in the conext of the VFIO ioctl in this case and shouldn't block anything else in the kernel, correct? At least that's how I understand the synchronize_rcu() comments and the fact that e.g. net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a synchronize_rcu() and can be triggered from user-space too. So we're more worried about user-space getting slowed down rather than a Denial- of-Service against other kernel tasks.