Re: [PATCH 3/5] iommu/s390: Use RCU to allow concurrent domain_list iteration

Niklas Schnelle <schnelle@xxxxxxxxxxxxx> · Thu, 20 Oct 2022 10:51:10 +0200

On Wed, 2022-10-19 at 08:53 -0300, Jason Gunthorpe wrote:
> On Wed, Oct 19, 2022 at 10:31:21AM +0200, Niklas Schnelle wrote:
> > On Tue, 2022-10-18 at 12:18 -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 18, 2022 at 04:51:30PM +0200, Niklas Schnelle wrote:
> > > 
> > > > @@ -84,7 +88,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev)
> > > >  		return;
> > > >  
> > > >  	spin_lock_irqsave(&s390_domain->list_lock, flags);
> > > > -	list_del_init(&zdev->iommu_list);
> > > > +	list_del_rcu(&zdev->iommu_list);
> > > >  	spin_unlock_irqrestore(&s390_domain->list_lock, flags);
> > > 
> > > This doesn't seem obviously OK, the next steps remove the translation
> > > while we can still have concurrent RCU protected flushes going on.
> > > 
> > > Is it OK to call the flushes when after the zpci_dma_exit_device()/etc?
> > > 
> > > Jason
> > 
> > Interesting point. So for the flushes themselves this should be fine,
> > once the zpci_unregister_ioat() is executed all subsequent and ongoing
> > IOTLB flushes should return an error code without further adverse
> > effects. Though I think we do still have an issue in the IOTLB ops for
> > this case as that error would skip the IOTLB flushes of other attached
> > devices.
> 
> That sounds bad

Thankfully it's very easy to fix since our IOTLB flushes are per PCI
function, I just need to continue the loop in the IOTLB ops on error
instead of breaking out of it and skipping the other devices. Makes no
sense anyway to skip  devices just because there is an error on another
device.

> 
> 
> > The bigger question and that seems independent from RCU is how/if
> > detach is supposed to work if there are still DMAs ongoing. Once we do
> > the zpci_unregister_ioat() any DMA request coming from the PCI device
> > will be blocked and will lead to the PCI device being isolated (put
> > into an error state) for attempting an invalid DMA. So I had expected
> > that calls of detach/attach would happen without expected ongoing DMAs
> > and thus IOTLB flushes? 
> 
> "ongoing DMA" from this device shouuld be stopped, it doesn't mean
> that the other devices attached to the same domain are not also still
> operating and also still having flushes. So now that it is RCU a flush
> triggered by a different device will continue to see this now disabled
> device and try to flush it until the grace period.
> 
> Jason

Ok that makes sense thanks for the explanation. So yes my assessment is
still that in this situation the IOTLB flush is architected to return
an error that we can ignore. Not the most elegant I admit but at least
it's simple. Alternatively I guess we could use call_rcu() to do the
zpci_unregister_ioat() but I'm not sure how to then make sure that a
subsequent zpci_register_ioat() only happens after that without adding
too much more logic.