Re: [PATCH v5 00/19] Add vfio_device cdev for iommufd support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: 08 March 2023 15:55
> To: 'Nicolin Chen' <nicolinc@xxxxxxxxxx>
> Cc: Xu, Terrence <terrence.xu@xxxxxxxxx>; Liu, Yi L <yi.l.liu@xxxxxxxxx>;
> Jason Gunthorpe <jgg@xxxxxxxxxx>; alex.williamson@xxxxxxxxxx; Tian,
> Kevin <kevin.tian@xxxxxxxxx>; joro@xxxxxxxxxx; robin.murphy@xxxxxxx;
> cohuck@xxxxxxxxxx; eric.auger@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx;
> mjrosato@xxxxxxxxxxxxx; chao.p.peng@xxxxxxxxxxxxxxx;
> yi.y.sun@xxxxxxxxxxxxxxx; peterx@xxxxxxxxxx; jasowang@xxxxxxxxxx;
> lulu@xxxxxxxxxx; suravee.suthikulpanit@xxxxxxx;
> intel-gvt-dev@xxxxxxxxxxxxxxxxxxxxx; intel-gfx@xxxxxxxxxxxxxxxxxxxxx;
> linux-s390@xxxxxxxxxxxxxxx; Hao, Xudong <xudong.hao@xxxxxxxxx>; Zhao,
> Yan Y <yan.y.zhao@xxxxxxxxx>
> Subject: RE: [PATCH v5 00/19] Add vfio_device cdev for iommufd support
> 

[...]
> > > > On Thu, Mar 02, 2023 at 09:43:00AM +0000, Shameerali Kolothum
> > > > Thodi
> > > > wrote:
> > > >
> > > > > Hi Nicolin,
> > > > >
> > > > > Thanks for the latest ARM64 branch. Do you have a working Qemu
> > > > > branch
> > > > corresponding to the
> > > > > above one?
> > > > >
> > > > > I tried the
> > > >
> >
> https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2B
> > > > smmuv3
> > > > > but for some reason not able to launch the Guest.
> > > > >
> > > > > Please let me know.
> > > >
> > > > I do use that branch. It might not be that robust though as it
> > > > went through a big rebase.
> > >
> > > Ok. The issue seems to be quite random in nature and only happens
> > > when there are multiple vCPUs. Also doesn't look like related to
> > > VFIO device assignment as I can reproduce Guest hang without it by
> > > only having nested-smmuv3 and iommufd object.
> > >
> > > ./qemu-system-aarch64-iommuf -machine
> > > virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
> > -enable-kvm
> > > -cpu host -m 1G -smp cpus=8,maxcpus=8 \ -object
> iommufd,id=iommufd0
> > \
> > > -bios QEMU_EFI.fd \ -kernel Image-6.2-iommufd \ -initrd
> > > rootfs-iperf.cpio \ -net none \ -nographic \ -append "rdinit=init
> > > console=ttyAMA0 root=/dev/vda rw earlycon=pl011,0x9000000" \ -trace
> > > events=events \ -D trace_iommufd
> > >
> > > When the issue happens, no output on terminal as if Qemu is in a
> > > locked
> > state.
> > >
> > >  Can you try with the followings?
> > > >
> > > > --trace "iommufd*" --trace "smmu*" --trace "vfio_*" --trace "pci_*"
> > > > --trace "msi_*" --trace "nvme_*"
> > >
> > > The only trace events with above are this,
> > >
> > > iommufd_backend_connect fd=22 owned=1 users=1 (0) smmu_add_mr
> > > smmuv3-iommu-memory-region-0-0
> > >
> > > I haven't debugged this further. Please let me know if issue is
> > > reproducible with multiple vCPUs at your end. For now will focus on
> > > VFIO
> > dev specific tests.
> >
> > Oh. My test environment has been a single-core vCPU. So that doesn't
> > happen to me. Can you try a vanilla QEMU branch that our nesting
> > branch is rebased on? I took a branch from Yi as the baseline, while
> > he might take from Eric for the rfcv3.
> >
> > I am guessing that it might be an issue in the common tree.
> 
> Yes, that looks like the case.
> I tried with:
>  commit 13356edb8750("Merge tag 'block-pull-request' of
> https://gitlab.com/stefanha/qemu into staging")
> 
> And issue is still there. So hopefully once we rebase everything it will go
> away.

Hi Nicolin,

I rebased your latest Qemu branch[1] on top of v7.2.0 and not observed
the above issue so far. However noticed couple of other issues when
we try to hot add/remove devices.

(qemu) device_del net1
qemu-system-aarch64-iommufd: Failed to free id: 4 Inappropriate ioctl for device
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000101000, 0xf000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: IOMMU_IOAS_UNMAP failed: No such file or directory
qemu-system-aarch64-iommufd: vfio_dma_unmap(0xaaaaf587a3d0, 0x8000000000, 0x100000) = -2 (No such file or directory)
qemu-system-aarch64-iommufd: Failed to free id:1 Device or resource busy

Ignoring the MMIO UNMAP errors, it looks like the object free is
not proper on dev removal path. I have few quick fixes here 
for this,
https://github.com/hisilicon/qemu/tree/private-v7.2.0-iommufd-nesting

With the above, it seems the HWPT/IOAS objects are destroyed properly
on dev detach path. But when the dev is added back, gets a Qemu seg fault
and so far I have no clue why that happens.

(qemu) device_add vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1
./qemu_run-iommufd-nested: line 13:  7041 Segmentation fault
(core dumped) ./qemu-system-aarch64-iommufd
-machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0
-enable-kvm -cpu host -m 1G -smp cpus=8,maxcpus=8 -object
iommufd,id=iommufd0 -bios QEMU_EFI_Dec2018.fd -kernel
Image-iommufd -initrd rootfs-iperf.cpio -device
ioh3420,id=rp1 -device
vfio-pci,host=0000:7d:02.1,iommufd=iommufd0,bus=rp1,id=net1 -append
"rdinit=init console=ttyAMA0 root=/dev/vda rw
earlycon=pl011,0x9000000" -net none -nographic -trace events=events -D
trace_iommufd

There are no kernel log/crash and not much useful traces while this happens.
Understand these are early days and it is not robust in anyway, but please
let me know if you suspect anything. I will continue debugging and will update
if anything.

Thanks,
Shameer

[1] https://github.com/nicolinc/qemu/tree/wip/iommufd_rfcv3%2Bnesting%2Bsmmuv3






[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux