On Tue, 2024-01-16 at 18:08 +0800, Baochen Qiang wrote: > > > On 1/16/2024 1:46 AM, Alex Williamson wrote: > > On Sun, 14 Jan 2024 16:36:02 +0200 > > Kalle Valo <kvalo@xxxxxxxxxx> wrote: > > > > > Baochen Qiang <quic_bqiang@xxxxxxxxxxx> writes: > > > > > > > > > Strange that still fails. Are you now seeing this error in your > > > > > > host or your Qemu? or both? > > > > > > Could you share your test steps? And if you can share please be as > > > > > > detailed as possible since I'm not familiar with passing WLAN > > > > > > hardware to a VM using vfio-pci. > > > > > > > > > > Just in Qemu, the hardware works fine on my host machine. > > > > > I basically follow this guide to set it up, its written in the > > > > > context of GPUs/libvirt but the host setup is exactly the same. By > > > > > no means do you need to read it all, once you set the vfio-pci.ids > > > > > and see your unclaimed adapter you can stop: > > > > > https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF > > > > > In short you should be able to set the following host kernel options > > > > > and reboot (assuming your motherboard/hardware is compatible): > > > > > intel_iommu=on iommu=pt vfio-pci.ids=17cb:1103 > > > > > Obviously change the device/vendor IDs to whatever ath11k hw you > > > > > have. Once the host is rebooted you should see your wlan adapter as > > > > > UNCLAIMED, showing the driver in use as vfio-pci. If not, its likely > > > > > your motherboard just isn't compatible, the device has to be in its > > > > > own IOMMU group (you could try switching PCI ports if this is the > > > > > case). > > > > > I then build a "kvm_guest.config" kernel with the driver/firmware > > > > > for ath11k and boot into that with the following Qemu options: > > > > > -enable-kvm -device -vfio-pci,host=<PCI address> > > > > > If it seems easier you could also utilize IWD's test-runner which > > > > > handles launching the Qemu kernel automatically, detecting any > > > > > vfio-devices and passes them through and mounts some useful host > > > > > folders into the VM. Its actually a very good general purpose tool > > > > > for kernel testing, not just for IWD: > > > > > https://git.kernel.org/pub/scm/network/wireless/iwd.git/tree/doc/test-runner.txt > > > > > Once set up you can just run test-runner with a few flags and you'll > > > > > boot into a shell: > > > > > ./tools/test-runner -k <kernel-image> --hw --start /bin/bash > > > > > Please reach out if you have questions, thanks for looking into > > > > > this. > > > > > > > > Thanks for these details. I reproduced this issue by following your guide. > > > > > > > > Seems the root cause is that the MSI vector assigned to WCN6855 in > > > > qemu is different with that in host. In my case the MSI vector in qemu > > > > is [Address: fee00000 Data: 0020] while in host it is [Address: > > > > fee00578 Data: 0000]. So in qemu ath11k configures MSI vector > > > > [Address: fee00000 Data: 0020] to WCN6855 hardware/firmware, and > > > > firmware uses that vector to fire interrupts to host/qemu. However > > > > host IOMMU doesn't know that vector because the real vector is > > > > [Address: fee00578 Data: 0000], as a result host blocks that > > > > interrupt and reports an error, see below log: > > > > > > > > [ 1414.206069] DMAR: DRHD: handling fault status reg 2 > > > > [ 1414.206081] DMAR: [INTR-REMAP] Request device [02:00.0] fault index > > > > 0x0 [fault reason 0x25] Blocked a compatibility format interrupt > > > > request > > > > [ 1414.210334] DMAR: DRHD: handling fault status reg 2 > > > > [ 1414.210342] DMAR: [INTR-REMAP] Request device [02:00.0] fault index > > > > 0x0 [fault reason 0x25] Blocked a compatibility format interrupt > > > > request > > > > [ 1414.212496] DMAR: DRHD: handling fault status reg 2 > > > > [ 1414.212503] DMAR: [INTR-REMAP] Request device [02:00.0] fault index > > > > 0x0 [fault reason 0x25] Blocked a compatibility format interrupt > > > > request > > > > [ 1414.214600] DMAR: DRHD: handling fault status reg 2 > > > > > > > > While I don't think there is a way for qemu/ath11k to get the real MSI > > > > vector from host, I will try to read the vfio code to check further. > > > > Before that, to unblock you, a possible hack is to hard code the MSI > > > > vector in qemu to the same as in host, on condition that the MSI > > > > vector doesn't change. > > > > > > Baochen, awesome that you were able to debug this further. Now we at > > > least know what's the problem. > > > > It's an interesting problem, I don't think we've seen another device > > where the driver reads the MSI register in order to program another > > hardware entity to match the MSI address and data configuration. > > > > When assigning a device, the host and guest use entirely separate > > address spaces for MSI interrupts. When the guest enables MSI, the > > operation is trapped by the VMM and triggers an ioctl to the host to > > perform an equivalent configuration. Generally the physical device > > will interrupt within the host where it may be directly attached to KVM > > to signal the interrupt, trigger through the VMM, or where > > virtualization hardware supports it, the interrupt can directly trigger > > the vCPU. From the VM perspective, the guest address/data pair is used > > to signal the interrupt, which is why it makes sense to virtualize the > > MSI registers. > > Hi Alex, could you help elaborate more? why from the VM perspective MSI > virtualization is necessary? An MSI is just a write to physical memory space. You can even use it like that; configure the device to just write 4 bytes to some address in a struct in memory to show that it needs attention, and you then poll that memory. But mostly we don't (ab)use it like that, of course. We tell the device to write to a special range of the physical address space where the interrupt controller lives — the range from 0xfee00000 to 0xfeefffff. The low 20 bits of the address, and the 32 bits of data written to that address, tell the interrupt controller which CPU to interrupt, and which vector to raise on the CPU (as well as some other details and weird interrupt modes which are theoretically encodable). So in your example, the guest writes [Address: fee00000 Data: 0020] which means it wants vector 0x20 on CPU#0 (well, the CPU with APICID 0). But that's what the *guest* wants. If we just blindly programmed that into the hardware, the hardware would deliver vector 0x20 to the host's CPU0... which would be very confused by it. The host has a driver for that device, probably the VFIO driver. The host registers its own interrupt handlers for the real hardware, decides which *host* CPU (and vector) should be notified when something happens. And when that happens, the VFIO driver will raise an event on an eventfd, which will notify QEMU to inject the appropriate interrupt into the guest. So... when the guest enables the MSI, that's trapped by QEMU which remembers which *guest* CPU/vector the interrupt should go to. QEMU tells VFIO to enable the corresponding interrupt, and what gets programmed into the actual hardware is up to the *host* operating system; nothing to do with the guest's information at all. Then when the actual hardware raises the interrupt, the VFIO interrupt handler runs in the guest, signals an event on the eventfd, and QEMU receives that and injects the event into the appropriate guest vCPU. (In practice QEMU doesn't do it these days; there's actually a shortcut which improves latency by allowing the kernel to deliver the event to the guest directly, connecting the eventfd directly to the KVM irq routing table.) Interrupt remapping is probably not important here, but I'll explain it briefly anyway. With interrupt remapping, the IOMMU handles the 'memory' write from the device, just as it handles all other memory transactions. One of the reasons for interrupt remapping is that the original definitions of the bits in the MSI (the low 20 bits of the address and the 32 bits of what's written) only had 8 bits for the target CPU APICID. And we have bigger systems than that now. So by using one of the spare bits in the MSI message, we can indicate that this isn't just a directly-encoded cpu/vector in "Compatibility Format", but is a "Remappable Format" interrupt. Instead of the cpu/vector it just contains an index in to the IOMMU's Interrupt Redirection Table. Which *does* have a full 32-bits for the target APIC ID. That's why x2apic support (which gives us support for >254 CPUs) depends on interrupt remapping. The other thing that the IOMMU can do in modern systems is *posted* interrupts. Where the entry in the IOMMU's IRT doesn't just specify the host's CPU/vector, but actually specifies a *vCPU* to deliver the interrupt to. All of which is mostly irrelevant as it's just another bypass optimisation to improve latency. The key here is that what the guest writes to its emulated MSI table and what the host writes to the real hardware are not at all related. If we had had this posted interrupt support from the beginning, perhaps we could have have a much simpler model — we just let the guest write its intended (v)CPU#/vector *directly* to the MSI table in the device, and let the IOMMU fix it up by having a table pointing to the appropriate set of vCPUs. But that isn't how it happened. The model we have is that the VMM has to *emulate* the config space and handle the interrupts as described above. This means that whenever a device has a non-standard way of configuring MSIs, the VMM has to understand and intercept that. I believe we've even seen some Atheros devices with the MSI target in some weird MMIO registers instead of the standard location, so we've had to hack QEMU to handle those too? > And, maybe a stupid question, is that possible VM/KVM or vfio only > virtualize write operation to MSI register but leave read operation > un-virtualized? I am asking this because in that way ath11k may get a > chance to run in VM after getting the real vector. That might confuse a number of operating systems. Especially if they mask/unmask by reading the register, flipping the mask bit and writing back again. How exactly is the content of this register then given back to the firmware? Is that communication snoopable by the VMM? > > > > Off hand I don't have a good solution for this, the hardware is > > essentially imposing a unique requirement for MSI programming that the > > driver needs visibility of the physical MSI address and data. > > Strictly, the driver doesn't need visibility to the actual values used by the hardware. Another way of it looking at it would be to say that the driver programs the MSI through this non-standard method, it just needs the VMM to trap and handle that, just as the VMM does for the standard MSI table. Which is what I thought we'd already seen on some Atheros devices. > > It's > > conceivable that device specific code could either make the physical > > address/data pair visible to the VM or trap the firmware programming to > > inject the correct physical values. Is there somewhere other than the > > standard MSI capability in config space that the driver could learn the > > physical values, ie. somewhere that isn't virtualized? Thanks, > > I don't think we have such capability in configuration space. Configuration space is a complete fiction though; it's all emulated. We can do anything we like. Or we can have a PV hypercall which will report it. I don't know that we'd *want* to, but all things are possible.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature