On Sat, Apr 18, 2020 at 09:41:27AM +0300, Jon Doron wrote: > On 17/04/2020, Roman Kagan wrote: > > On Thu, Apr 16, 2020 at 03:54:30PM +0300, Jon Doron wrote: > > > On 16/04/2020, Roman Kagan wrote: > > > > On Thu, Apr 16, 2020 at 11:38:46AM +0300, Jon Doron wrote: > > > > > According to the TLFS: > > > > > "A write to the end of message (EOM) register by the guest causes the > > > > > hypervisor to scan the internal message buffer queue(s) associated with > > > > > the virtual processor. > > > > > > > > > > If a message buffer queue contains a queued message buffer, the hypervisor > > > > > attempts to deliver the message. > > > > > > > > > > Message delivery succeeds if the SIM page is enabled and the message slot > > > > > corresponding to the SINTx is empty (that is, the message type in the > > > > > header is set to HvMessageTypeNone). > > > > > If a message is successfully delivered, its corresponding internal message > > > > > buffer is dequeued and marked free. > > > > > If the corresponding SINTx is not masked, an edge-triggered interrupt is > > > > > delivered (that is, the corresponding bit in the IRR is set). > > > > > > > > > > This register can be used by guests to poll for messages. It can also be > > > > > used as a way to drain the message queue for a SINTx that has > > > > > been disabled (that is, masked)." > > > > > > > > Doesn't this work already? > > > > > > > > > > Well if you dont have SCONTROL and a GSI associated with the SINT then it > > > does not... > > > > Yes you do need both of these. > > > > > > > So basically this means that we need to exit on EOM so the hypervisor > > > > > will have a chance to send all the pending messages regardless of the > > > > > SCONTROL mechnaisim. > > > > > > > > I might be misinterpreting the spec, but my understanding is that > > > > SCONTROL {en,dis}ables the message queueing completely. What the quoted > > > > part means is that a write to EOM should trigger the message source to > > > > push a new message into the slot, regardless of whether the SINT was > > > > masked or not. > > > > > > > > And this (I think, haven't tested) should already work. The userspace > > > > just keeps using the SINT route as it normally does, posting > > > > notifications to the corresponding irqfd when posting a message, and > > > > waiting on the resamplerfd for the message slot to become free. If the > > > > SINT is masked KVM will skip injecting the interrupt, that's it. > > > > > > > > Roman. > > > > > > That's what I was thinking originally as well, but then i noticed KDNET as a > > > VMBus client (and it basically runs before anything else) is working in this > > > polling mode, where SCONTROL is disabled and it just loops, and if it saw > > > there is a PENDING message flag it will issue an EOM to indicate it has free > > > the slot. > > > > Who sets up the message page then? Doesn't it enabe SCONTROL as well? > > > > KdNet is the one setting the SIMP and it's not setting the SCONTROL, ill > paste output of KVM traces for the relevant MSRs > > > Note that, even if you don't see it being enabled by Windows, it can be > > enabled by the firmware and/or by the bootloader. > > > > Can you perhaps try with the SeaBIOS from > > https://src.openvz.org/projects/UP/repos/seabios branch hv-scsi? It > > enables SCONTROL and leaves it that way. > > > > I'd also suggest tracing kvm_msr events (both reads and writes) for > > SCONTROL and SIMP msrs, to better understand the picture. > > > > So far the change you propose appears too heavy to work around the > > problem of disabled SCONTROL. You seem to be better off just making > > sure it's enabled (either by the firmware or slighly violating the spec > > and initializing to enabled from the start), and sticking to the > > existing infrastructure for posting messages. > > > > I guess there is something I'm missing here but let's say the BIOS would > have set the SCONTROL but the OS is not setting it, who is in charge of > handling the interrupts? SCONTROL doesn't enable the interrupts, it enables SynIC as a whole. The interrupts are enabled via individual SINTx msrs. This SeaBIOS branch does exactly this: it enables the SynIC via SCONTROL, and then specific SynIC functionality via SIMP/SIEFP, but doesn't activate SINTx and works in polling mode. I agree that this global SCONTROL switch seems redundant but it appears to match the spec. > > > (There are a bunch of patches i sent on the QEMU mailing list as well where > > > i CCed you, I will probably revise it a bit but was hoping to get KVM > > > sorted out first). > > > > I'll look through the archive, should be there, thanks. > > > > Roman. > > I tried testing with both the SeaBIOS branch you have suggested and the > EDK2, unfortunately I could not get the EDK2 build to identify my VM drive > to boot from (not sure why) > > Here is an output of KVM trace for the relevant MSRs (SCONTROL and SIMP) > > QEMU Default BIOS > ----------------- > qemu-system-x86-613 [000] .... 1121.080722: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1 > qemu-system-x86-613 [000] .... 1121.080722: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1 > qemu-system-x86-613 [000] .N.. 1121.095592: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1 > qemu-system-x86-613 [000] .N.. 1121.095592: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1 > Choose Windows DebugEntry > qemu-system-x86-613 [001] .... 1165.185227: kvm_msr: msr_read 40000083 = 0x0 > qemu-system-x86-613 [001] .... 1165.185255: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1001 host 0 > qemu-system-x86-613 [001] .... 1165.185255: kvm_msr: msr_write 40000083 = 0xfa1001 > qemu-system-x86-613 [001] .... 1165.193206: kvm_msr: msr_read 40000083 = 0xfa1001 > qemu-system-x86-613 [001] .... 1165.193236: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1000 host 0 > qemu-system-x86-613 [001] .... 1165.193237: kvm_msr: msr_write 40000083 = 0xfa1000 > > > SeaBIOS hv-scsci > ---------------- > qemu-system-x86-656 [001] .... 1313.072714: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1 > qemu-system-x86-656 [001] .... 1313.072714: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1 > qemu-system-x86-656 [001] .... 1313.087752: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1 > qemu-system-x86-656 [001] .... 1313.087752: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1 Initialization (host == 1) > qemu-system-x86-656 [001] .... 1313.156675: kvm_msr: msr_read 40000083 = 0x0 > qemu-system-x86-656 [001] .... 1313.156680: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x7fffe001 host 0 > Choose Windows DebugEntry I guess this is a bit misplaced timewise, BIOS is still working here > qemu-system-x86-656 [001] .... 1313.156680: kvm_msr: msr_write 40000083 = 0x7fffe001 BIOS sets up message page > qemu-system-x86-656 [001] .... 1313.162111: kvm_msr: msr_read 40000080 = 0x0 > qemu-system-x86-656 [001] .... 1313.162118: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x1 host 0 > qemu-system-x86-656 [001] .... 1313.162119: kvm_msr: msr_write 40000080 = 0x1 BIOS activates SCONTROL > qemu-system-x86-656 [001] .... 1313.246758: kvm_msr: msr_read 40000083 = 0x7fffe001 > qemu-system-x86-656 [001] .... 1313.246764: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 0 > qemu-system-x86-656 [001] .... 1313.246764: kvm_msr: msr_write 40000083 = 0x0 BIOS clears message page (it's not needed once the VMBus device was brought up) I guess the choice of Windows DebugEntry appeared somewhere here. > qemu-system-x86-656 [001] .... 1348.904727: kvm_msr: msr_read 40000083 = 0x0 > qemu-system-x86-656 [001] .... 1348.904771: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1001 host 0 > qemu-system-x86-656 [001] .... 1348.904772: kvm_msr: msr_write 40000083 = 0xfa1001 Bootloader (debug stub?) sets up the message page > qemu-system-x86-656 [001] .... 1348.919170: kvm_msr: msr_read 40000083 = 0xfa1001 > qemu-system-x86-656 [001] .... 1348.919183: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1000 host 0 > qemu-system-x86-656 [001] .... 1348.919183: kvm_msr: msr_write 40000083 = 0xfa1000 Message page is being disabled again. I guess you only filtered SCONTROL and SIMP, skipping e.g. SVERSION, GUEST_OS_ID, HYPERCALL, etc., which are also part of the exchange here. > I could not get the EDK2 setup to work though > (https://src.openvz.org/projects/UP/repos/edk2 branch hv-scsi) > > It does not detect my VM hard drive not sure why (this is how i configured > it: > -drive file=./win10.qcow2,format=qcow2,if=none,id=drive_disk0 \ > -device virtio-blk-pci,drive=drive_disk0 \ > > (Is there something special i need to configure it order for it to work?, I > tried building EDK2 with and without SMM_REQUIRE and SECURE_BOOT_ENABLE) No special configuration I can think of. > But in general it sounds like there is something I dont fully understand > when SCONTROL is enabled, then a GSI is associated with this SintRoute. > > Then when the guest triggers an EOI via the APIC we will trigger the GSI > notification, which will give us another go on trying to copy the message > into it's slot. Right. > So is it the OS that is in charge of setting the EOI? Yes. > If so then it needs to > be aware of SCONTROL being enabled and just having it left set by the BIOS > might not be enough? Yes it needs to be aware of SCONTROL being enabled. However, this awareness may be based on a pure assumption that the previous entity (BIOS or bootloader) did it already. > Also in the TLFS (looking at v6) they mention that message queueing has "3 > exit conditions", which will cause the hypervisor to try and attempt to > deliver the additional messages. > > The 3 exit conditions they refer to are: > * Another message buffer is queued. > * The guest indicates the “end of interrupt” by writing to the APIC’s EOI > register. > * The guest indicates the “end of message” by writing to the SynIC’s EOM > register. > > Also notice this additional exit is only if there is a pending message and > not for every EOM. This meaning of "exit" doesn't trivially correspond to what we have in KVM. A write to an msr does cause a vmexit. Then KVM notifies resample eventfds for all SINTs that have them set up, no matter if there's a pending message in the slot. It may be slightly more optimal to only notify those having indicated a pending message, but I don't see the current behavior break anything or violate the spec, so, as EOMs are not used on fast paths, I woudn't bother optimizing. Roman.