Re: [PATCH] iommu/s390: Implement blocking domain

Jason Gunthorpe <jgg@xxxxxxxx> · Thu, 8 Aug 2024 16:41:55 -0300

On Thu, Aug 08, 2024 at 01:05:36PM -0400, Matthew Rosato wrote:

> Since c76c067e488c ("s390/pci: Use dma-iommu layer"), we stopped
> checking for a NULL zdev->s390_domain during
> s390_iommu_release_device.  This causes a crash specifically if the
> last attach_device call failed (we exit on -EIO with a NULL
> zdev->s390_domain).  I suppose we could also fix the specific crash
> by restoring the NULL check in s390_iommu_release_device, but this
> seems like a band-aid and I'm concerned that there's more lurking
> here when get to this state...

Implementing the static blocking domain and using release_domain
should fix all these crashes, I think. The core code will sequence
everything and zdev->s390_domain should never be NULL. You'll get
trapped in a blocking domain and can't exit which will trigger
WARN_ON, but at least for the first step of this problem it should be
enough.

> > But what is the backtrace that runs into this warn on? VFIO exiting
> > and trying to put the device back to the DMA API?
> 
> Yes, exactly
> 
> [  198.067373] ------------[ cut here ]------------
> [  198.067377] WARNING: CPU: 44 PID: 394 at drivers/iommu/iommu.c:122 __iommu_release_dma_ownership+0x72/0x80
> [  198.067386] Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass kvm sunrpc s390_trng eadm_sch tape_34xx tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel loop configfs nfnetlink lcs ctcm fsm ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 nvme sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common zfcp scsi_transport_fc 8021q garp pkey zcrypt rng_core autofs4
> [  198.067424] CPU: 44 UID: 0 PID: 394 Comm: kmcheck Not tainted 6.11.0-rc2 #111
> [  198.067427] Hardware name: IBM 3906 M05 780 (LPAR)
> [  198.067429] Krnl PSW : 0704c00180000000 000002bbfc744576 (__iommu_release_dma_ownership+0x76/0x80)
> [  198.067433]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  198.067437] Krnl GPRS: 0000000000000290 0000000000000000 fffffffffffffffb 000001b386842098
> [  198.067439]            0000000000000000 000001b3865b5c00 000000000000255b 000002bb7c586068
> [  198.067442]            000001b3ab82a018 000001b7ff0af6c0 000001b5d0e4af68 000001b5d0e4af00
> [  198.067444]            000001b389772300 0000023bffe33e28 000002bbfc744560 0000023bffe339d0
> [  198.067452] Krnl Code: 000002bbfc744568: f0a0000407fe	srp	4(11,%r0),2046,0
>                           000002bbfc74456e: 47000700		bc	0,1792
>                          #000002bbfc744572: af000000		mc	0,0
>                          >000002bbfc744576: a7f4fff8		brc	15,000002bbfc744566
>                           000002bbfc74457a: 0707		bcr	0,%r7
>                           000002bbfc74457c: 0707		bcr	0,%r7
>                           000002bbfc74457e: 0707		bcr	0,%r7
>                           000002bbfc744580: c0040025f71c	brcl	0,000002bbfcc033b8
> [  198.067468] Call Trace:
> [  198.067482]  [<000002bbfc744576>] __iommu_release_dma_ownership+0x76/0x80 
> [  198.067486] ([<000002bbfc744560>] __iommu_release_dma_ownership+0x60/0x80)
> [  198.067488]  [<000002bbfc7445b8>] iommu_group_release_dma_owner+0x38/0x50 
> [  198.067491]  [<000002bb7c2a1e24>] vfio_group_detach_container+0x154/0x180 [vfio] 
> [  198.067500]  [<000002bb7c2a0d88>] vfio_device_remove_group+0xd8/0x140 [vfio] 
> [  198.067505]  [<000002bb7c5648b4>] vfio_pci_core_unregister_device+0x34/0x80 [vfio_pci_core] 
> [  198.067513]  [<000002bb7c5841cc>] vfio_pci_remove+0x2c/0x40 [vfio_pci] 
> [  198.067517]  [<000002bbfc6ec7bc>] pci_device_remove+0x3c/0x90 
> [  198.067520]  [<000002bbfc75dda4>] device_release_driver_internal+0x1b4/0x260 
> [  198.067527]  [<000002bbfc6e2844>] pci_stop_bus_device+0x94/0xc0 
> [  198.067531]  [<000002bbfc6e2b66>] pci_stop_and_remove_bus_device+0x26/0x40 
> [  198.067534]  [<000002bbfc6e2bb0>] pci_stop_and_remove_bus_device_locked+0x30/0x40 
> [  198.067537]  [<000002bbfbde2838>] zpci_bus_remove_device+0x68/0xb0 
> [  198.067541]  [<000002bbfbde0780>] __zpci_event_availability+0x250/0x3a0 
> [  198.067543]  [<000002bbfc7e4528>] chsc_process_crw+0x2a8/0x2c0 
> [  198.067548]  [<000002bbfc7ec690>] crw_collect_info+0x2e0/0x360 
> [  198.067550]  [<000002bbfbe15bde>] kthread+0x11e/0x130 
> [  198.067556]  [<000002bbfbd930ec>] __ret_from_fork+0x3c/0x60 
> [  198.067558]  [<000002bbfcb233aa>] ret_from_fork+0xa/0x38 
> [  198.067564] Last Breaking-Event-Address:
> [  198.067565]  [<000002bbfc744560>] __iommu_release_dma_ownership+0x60/0x80
> [  198.067569] ---[ end trace 0000000000000000 ]---
> 
> > Though I feel like more is needed here if you expect to allow the
> > nofail version of this to actually fail.. For instance a force-blocked
> > device should block driver binding through the dma_owner APIs.
> 
> Yeah, that makes sense too. 

And I think fixing this should be another patch, though I'm not
entirely sure what the best approach will be.

One thought is that, presumably, when the hypervisor fences the device
it asynchronously makes it blocking, which is another way of saying it
has halted all DMA from that device.

So, even if we could attach a domain, it would never be used since
there is no possible DMA.

Thus, why fail the attach? There is no way for anyone to tell the
difference between an attached translation and not attached because
the DMA has been halted at the device.

If you can tell in the s390 driver that the hypervisor has fenced the
device then maybe just reporting success to the upper layer is the
right answer here.

Jason