On Sun, 3 Mar 2024 22:20:33 +0000 Xu Liu <liuxu@xxxxxxxx> wrote: > Hello, > > Recently I am running my programs in QEMU (x86_64) with “-accel=kvm”. > The QEMU version is 6.0.0. > > I run my programs in two ways: > > 1. I pass through my device through vfio-pci to QEMU, this way > works well. > > 2. I write an emulated PCI device for QEMU, and run my programs on > the emulated PCI device. This crashes when the code try to do memory > copy to PCI device when the data length is longer than 16 bytes. > While the passthrough device works well for the same situation. > > > After dump the assembly code. I noticed when the data is <= 16 > bytes, the mov assembly code is chosen, and it works well. > > When the data is > 16 bytes, the vmovdqu assembly code is chosen, > and it crashes with “illegal operand”. > > Given the code and data are exactly same for both passthrough device > and emulated device. I am curious about why this happens. > > After turn on kernel trace for kvm by echo kvm:* > /sys/kernel/debug/tracing/set_event And rerun the QEMU and my code > for both passthrough device and emulated device, I noticed that: > > 1) for passthrough device, I didn’t see any trace events related to > my gva and gpa. This makes me think that the memory copy to PCI > device went through different code path . It is handled by the guest > OS without exit to VMX. > > 2) for emulated device, if I use compiler flag > target-feature=-avx,-avx2 to force compiler use mov assembly code, > I can see the memory copy goes through the KVM_EXIT_MMIO, and > everything works well. if I don’t force the compiler use mov , the > compiler just chooses the vmovdqu , which just crash the programs, > and no KVM_EXIT_MMIO related to my memory copy appears in the trace > events. Looks like the guest OS handles the crash. > > > Any clue about why the vmovdqu works for passthrough device but not > work for emulated device. For an assigned device, the device MMIO space will be directly mapped into the VM address space (assuming the PCI BAR is at least PAGE_SIZE), so there's no emulation of the access. You can disable this with the x-no-mmap=on option for the vfio-pci device, where then I'd guess this behaves the same as your emulated device (assuming we really don't reach QEMU for the access). Since you're not seeing a KVM_EXIT_MMIO I'd guess this is more of a KVM issue than QEMU (Cc kvm list). Possibly KVM doesn't emulate vmovdqu relative to an MMIO access, but honestly I'm not positive that AVX instructions are meant to work on MMIO space. I'll let x86 KVM experts more familiar with specific opcode semantics weigh in on that. Is your "program" just doing a memcpy() with an mmap() of the PCI BAR acquired through pci-sysfs or a userspace vfio-pci driver within the guest? In QEMU 4a2e242bbb30 ("memory: Don't use memcpy for ram_device regions") we resolved an issue[1] where QEMU itself was doing a memcpy() to assigned device MMIO space resulting in breaking functionality of the device. IIRC memcpy() was using an SSE instruction that didn't fault, but didn't work correctly relative to MMIO space either. So I also wouldn't rule out that the program isn't inherently misbehaving by using memcpy() and thereby ignoring the nature of the device MMIO access semantics. Thanks, Alex [1]https://bugs.launchpad.net/qemu/+bug/1384892