On Thu, Aug 08, 2024 at 01:05:36PM -0400, Matthew Rosato wrote: > Since c76c067e488c ("s390/pci: Use dma-iommu layer"), we stopped > checking for a NULL zdev->s390_domain during > s390_iommu_release_device. This causes a crash specifically if the > last attach_device call failed (we exit on -EIO with a NULL > zdev->s390_domain). I suppose we could also fix the specific crash > by restoring the NULL check in s390_iommu_release_device, but this > seems like a band-aid and I'm concerned that there's more lurking > here when get to this state... Implementing the static blocking domain and using release_domain should fix all these crashes, I think. The core code will sequence everything and zdev->s390_domain should never be NULL. You'll get trapped in a blocking domain and can't exit which will trigger WARN_ON, but at least for the first step of this problem it should be enough. > > But what is the backtrace that runs into this warn on? VFIO exiting > > and trying to put the device back to the DMA API? > > Yes, exactly > > [ 198.067373] ------------[ cut here ]------------ > [ 198.067377] WARNING: CPU: 44 PID: 394 at drivers/iommu/iommu.c:122 __iommu_release_dma_ownership+0x72/0x80 > [ 198.067386] Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass kvm sunrpc s390_trng eadm_sch tape_34xx tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel loop configfs nfnetlink lcs ctcm fsm ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 nvme sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common zfcp scsi_transport_fc 8021q garp pkey zcrypt rng_core autofs4 > [ 198.067424] CPU: 44 UID: 0 PID: 394 Comm: kmcheck Not tainted 6.11.0-rc2 #111 > [ 198.067427] Hardware name: IBM 3906 M05 780 (LPAR) > [ 198.067429] Krnl PSW : 0704c00180000000 000002bbfc744576 (__iommu_release_dma_ownership+0x76/0x80) > [ 198.067433] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 198.067437] Krnl GPRS: 0000000000000290 0000000000000000 fffffffffffffffb 000001b386842098 > [ 198.067439] 0000000000000000 000001b3865b5c00 000000000000255b 000002bb7c586068 > [ 198.067442] 000001b3ab82a018 000001b7ff0af6c0 000001b5d0e4af68 000001b5d0e4af00 > [ 198.067444] 000001b389772300 0000023bffe33e28 000002bbfc744560 0000023bffe339d0 > [ 198.067452] Krnl Code: 000002bbfc744568: f0a0000407fe srp 4(11,%r0),2046,0 > 000002bbfc74456e: 47000700 bc 0,1792 > #000002bbfc744572: af000000 mc 0,0 > >000002bbfc744576: a7f4fff8 brc 15,000002bbfc744566 > 000002bbfc74457a: 0707 bcr 0,%r7 > 000002bbfc74457c: 0707 bcr 0,%r7 > 000002bbfc74457e: 0707 bcr 0,%r7 > 000002bbfc744580: c0040025f71c brcl 0,000002bbfcc033b8 > [ 198.067468] Call Trace: > [ 198.067482] [<000002bbfc744576>] __iommu_release_dma_ownership+0x76/0x80 > [ 198.067486] ([<000002bbfc744560>] __iommu_release_dma_ownership+0x60/0x80) > [ 198.067488] [<000002bbfc7445b8>] iommu_group_release_dma_owner+0x38/0x50 > [ 198.067491] [<000002bb7c2a1e24>] vfio_group_detach_container+0x154/0x180 [vfio] > [ 198.067500] [<000002bb7c2a0d88>] vfio_device_remove_group+0xd8/0x140 [vfio] > [ 198.067505] [<000002bb7c5648b4>] vfio_pci_core_unregister_device+0x34/0x80 [vfio_pci_core] > [ 198.067513] [<000002bb7c5841cc>] vfio_pci_remove+0x2c/0x40 [vfio_pci] > [ 198.067517] [<000002bbfc6ec7bc>] pci_device_remove+0x3c/0x90 > [ 198.067520] [<000002bbfc75dda4>] device_release_driver_internal+0x1b4/0x260 > [ 198.067527] [<000002bbfc6e2844>] pci_stop_bus_device+0x94/0xc0 > [ 198.067531] [<000002bbfc6e2b66>] pci_stop_and_remove_bus_device+0x26/0x40 > [ 198.067534] [<000002bbfc6e2bb0>] pci_stop_and_remove_bus_device_locked+0x30/0x40 > [ 198.067537] [<000002bbfbde2838>] zpci_bus_remove_device+0x68/0xb0 > [ 198.067541] [<000002bbfbde0780>] __zpci_event_availability+0x250/0x3a0 > [ 198.067543] [<000002bbfc7e4528>] chsc_process_crw+0x2a8/0x2c0 > [ 198.067548] [<000002bbfc7ec690>] crw_collect_info+0x2e0/0x360 > [ 198.067550] [<000002bbfbe15bde>] kthread+0x11e/0x130 > [ 198.067556] [<000002bbfbd930ec>] __ret_from_fork+0x3c/0x60 > [ 198.067558] [<000002bbfcb233aa>] ret_from_fork+0xa/0x38 > [ 198.067564] Last Breaking-Event-Address: > [ 198.067565] [<000002bbfc744560>] __iommu_release_dma_ownership+0x60/0x80 > [ 198.067569] ---[ end trace 0000000000000000 ]--- > > > Though I feel like more is needed here if you expect to allow the > > nofail version of this to actually fail.. For instance a force-blocked > > device should block driver binding through the dma_owner APIs. > > Yeah, that makes sense too. And I think fixing this should be another patch, though I'm not entirely sure what the best approach will be. One thought is that, presumably, when the hypervisor fences the device it asynchronously makes it blocking, which is another way of saying it has halted all DMA from that device. So, even if we could attach a domain, it would never be used since there is no possible DMA. Thus, why fail the attach? There is no way for anyone to tell the difference between an attached translation and not attached because the DMA has been halted at the device. If you can tell in the s390 driver that the hypervisor has fenced the device then maybe just reporting success to the upper layer is the right answer here. Jason