> -----Original Message----- > From: Shameerali Kolothum Thodi > Sent: 08 March 2023 15:55 > To: 'Nicolin Chen' <nicolinc@xxxxxxxxxx> > Cc: Xu, Terrence <terrence.xu@xxxxxxxxx>; Liu, Yi L <yi.l.liu@xxxxxxxxx>; > Jason Gunthorpe <jgg@xxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Tian, > Kevin <kevin.tian@xxxxxxxxx>; joro@xxxxxxxxxx; robin.murphy@xxxxxxx; > cohuck@xxxxxxxxxx; eric.auger@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > mjrosato@xxxxxxxxxxxxx; chao.p.peng@xxxxxxxxxxxxxxx; > yi.y.sun@xxxxxxxxxxxxxxx; peterx@xxxxxxxxxx; jasowang@xxxxxxxxxx; > lulu@xxxxxxxxxx; suravee.suthikulpanit@xxxxxxx; > intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx; > linux-s390@xxxxxxxxxxxxxxx; Hao, Xudong <xudong.hao@xxxxxxxxx>; Zhao, > Yan Y <yan.y.zhao@xxxxxxxxx> > Subject: RE: [PATCH v5 00/19] Add vfio_device cdev for iommufd support > [...] > > > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum > > > > Thodi > > > > wrote: > > > > > > > > > Hi Nicolin, > > > > > > > > > > Thanks for the latest ARM64 branch. Do you have a working Qemu > > > > > branch > > > > corresponding to the > > > > > above one? > > > > > > > > > > I tried the > > > > > > > https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B > > > > smmuv3 > > > > > but for some reason not able to launch the Guest. > > > > > > > > > > Please let me know. > > > > > > > > I do use that branch. It might not be that robust though as it > > > > went through a big rebase. > > > > > > Ok. The issue seems to be quite random in nature and only happens > > > when there are multiple vCPUs. Also doesn't look like related to > > > VFIO device assignment as I can reproduce Guest hang without it by > > > only having nested-smmuv3 and iommufd object. > > > > > > ./qemu-system-aarch64-iommuf -machine > > > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \ > > -enable-kvm > > > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object > iommufd,id=iommufd0 > > \ > > > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd > > > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init > > > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace > > > events=events \ -D trace_iommufd > > > > > > When the issue happens, no output on terminal as if Qemu is in a > > > locked > > state. > > > > > > Can you try with the followings? > > > > > > > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*" > > > > --trace "msi_*" --trace "nvme_*" > > > > > > The only trace events with above are this, > > > > > > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr > > > smmuv3-iommu-memory-region-0-0 > > > > > > I haven't debugged this further. Please let me know if issue is > > > reproducible with multiple vCPUs at your end. For now will focus on > > > VFIO > > dev specific tests. > > > > Oh. My test environment has been a single-core vCPU. So that doesn't > > happen to me. Can you try a vanilla QEMU branch that our nesting > > branch is rebased on? I took a branch from Yi as the baseline, while > > he might take from Eric for the rfcv3. > > > > I am guessing that it might be an issue in the common tree. > > Yes, that looks like the case. > I tried with: > commit 13356edb8750("Merge tag 'block-pull-request' of > https://gitlab.com/stefanha/qemu into staging") > > And issue is still there. So hopefully once we rebase everything it will go > away. Hi Nicolin, I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed the above issue so far. However noticed couple of other issues when we try to hot add/remove devices. (qemu) device_del net1 qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory) qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory) qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy Ignoring the MMIO UNMAP errors, it looks like the object free is not proper on dev removal path. I have few quick fixes here for this, https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting With the above, it seems the HWPT/IOAS objects are destroyed properly on dev detach path. But when the dev is added back, gets a Qemu seg fault and so far I have no clue why that happens. (qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 ./qemu_run-iommufd-nested: line 13: 7041 Segmentation fault (core dumped) ./qemu-system-aarch64-iommufd -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 -enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel Image-iommufd -initrd rootfs-iperf.cpio -device ioh3420,id=rp1 -device vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append "rdinit=init console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D trace_iommufd There are no kernel log/crash and not much useful traces while this happens. Understand these are early days and it is not robust in anyway, but please let me know if you suspect anything. I will continue debugging and will update if anything. Thanks, Shameer [1] https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3