Re: [PATCH v2 0/1] x86/kvm/hyper-v: Add support to SYNIC exit on EOM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 18, 2020 at 09:41:27AM +0300, Jon Doron wrote:
> On 17/04/2020, Roman Kagan wrote:
> > On Thu, Apr 16, 2020 at 03:54:30PM +0300, Jon Doron wrote:
> > > On 16/04/2020, Roman Kagan wrote:
> > > > On Thu, Apr 16, 2020 at 11:38:46AM +0300, Jon Doron wrote:
> > > > > According to the TLFS:
> > > > > "A write to the end of message (EOM) register by the guest causes the
> > > > > hypervisor to scan the internal message buffer queue(s) associated with
> > > > > the virtual processor.
> > > > >
> > > > > If a message buffer queue contains a queued message buffer, the hypervisor
> > > > > attempts to deliver the message.
> > > > >
> > > > > Message delivery succeeds if the SIM page is enabled and the message slot
> > > > > corresponding to the SINTx is empty (that is, the message type in the
> > > > > header is set to HvMessageTypeNone).
> > > > > If a message is successfully delivered, its corresponding internal message
> > > > > buffer is dequeued and marked free.
> > > > > If the corresponding SINTx is not masked, an edge-triggered interrupt is
> > > > > delivered (that is, the corresponding bit in the IRR is set).
> > > > >
> > > > > This register can be used by guests to poll for messages. It can also be
> > > > > used as a way to drain the message queue for a SINTx that has
> > > > > been disabled (that is, masked)."
> > > >
> > > > Doesn't this work already?
> > > >
> > > 
> > > Well if you dont have SCONTROL and a GSI associated with the SINT then it
> > > does not...
> > 
> > Yes you do need both of these.
> > 
> > > > > So basically this means that we need to exit on EOM so the hypervisor
> > > > > will have a chance to send all the pending messages regardless of the
> > > > > SCONTROL mechnaisim.
> > > >
> > > > I might be misinterpreting the spec, but my understanding is that
> > > > SCONTROL {en,dis}ables the message queueing completely.  What the quoted
> > > > part means is that a write to EOM should trigger the message source to
> > > > push a new message into the slot, regardless of whether the SINT was
> > > > masked or not.
> > > >
> > > > And this (I think, haven't tested) should already work.  The userspace
> > > > just keeps using the SINT route as it normally does, posting
> > > > notifications to the corresponding irqfd when posting a message, and
> > > > waiting on the resamplerfd for the message slot to become free.  If the
> > > > SINT is masked KVM will skip injecting the interrupt, that's it.
> > > >
> > > > Roman.
> > > 
> > > That's what I was thinking originally as well, but then i noticed KDNET as a
> > > VMBus client (and it basically runs before anything else) is working in this
> > > polling mode, where SCONTROL is disabled and it just loops, and if it saw
> > > there is a PENDING message flag it will issue an EOM to indicate it has free
> > > the slot.
> > 
> > Who sets up the message page then?  Doesn't it enabe SCONTROL as well?
> > 
> 
> KdNet is the one setting the SIMP and it's not setting the SCONTROL, ill
> paste output of KVM traces for the relevant MSRs
> 
> > Note that, even if you don't see it being enabled by Windows, it can be
> > enabled by the firmware and/or by the bootloader.
> > 
> > Can you perhaps try with the SeaBIOS from
> > https://src.openvz.org/projects/UP/repos/seabios branch hv-scsi?  It
> > enables SCONTROL and leaves it that way.
> > 
> > I'd also suggest tracing kvm_msr events (both reads and writes) for
> > SCONTROL and SIMP msrs, to better understand the picture.
> > 
> > So far the change you propose appears too heavy to work around the
> > problem of disabled SCONTROL.  You seem to be better off just making
> > sure it's enabled (either by the firmware or slighly violating the spec
> > and initializing to enabled from the start), and sticking to the
> > existing infrastructure for posting messages.
> > 
> 
> I guess there is something I'm missing here but let's say the BIOS would
> have set the SCONTROL but the OS is not setting it, who is in charge of
> handling the interrupts?

SCONTROL doesn't enable the interrupts, it enables SynIC as a whole.
The interrupts are enabled via individual SINTx msrs.  This SeaBIOS
branch does exactly this: it enables the SynIC via SCONTROL, and then
specific SynIC functionality via SIMP/SIEFP, but doesn't activate SINTx
and works in polling mode.

I agree that this global SCONTROL switch seems redundant but it appears
to match the spec.

> > > (There are a bunch of patches i sent on the QEMU mailing list as well  where
> > > i CCed you, I will probably revise it a bit but was hoping to get  KVM
> > > sorted out first).
> > 
> > I'll look through the archive, should be there, thanks.
> > 
> > Roman.
> 
> I tried testing with both the SeaBIOS branch you have suggested and the
> EDK2, unfortunately I could not get the EDK2 build to identify my VM drive
> to boot from (not sure why)
> 
> Here is an output of KVM trace for the relevant MSRs (SCONTROL and SIMP)
> 
> QEMU Default BIOS
> -----------------
>  qemu-system-x86-613   [000] ....  1121.080722: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>  qemu-system-x86-613   [000] ....  1121.080722: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
>  qemu-system-x86-613   [000] .N..  1121.095592: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>  qemu-system-x86-613   [000] .N..  1121.095592: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
> Choose Windows DebugEntry
>  qemu-system-x86-613   [001] ....  1165.185227: kvm_msr: msr_read 40000083 = 0x0
>  qemu-system-x86-613   [001] ....  1165.185255: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1001 host 0
>  qemu-system-x86-613   [001] ....  1165.185255: kvm_msr: msr_write 40000083 = 0xfa1001
>  qemu-system-x86-613   [001] ....  1165.193206: kvm_msr: msr_read 40000083 = 0xfa1001
>  qemu-system-x86-613   [001] ....  1165.193236: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1000 host 0
>  qemu-system-x86-613   [001] ....  1165.193237: kvm_msr: msr_write 40000083 = 0xfa1000
> 
> 
> SeaBIOS hv-scsci
> ----------------
>  qemu-system-x86-656   [001] ....  1313.072714: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>  qemu-system-x86-656   [001] ....  1313.072714: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1
>  qemu-system-x86-656   [001] ....  1313.087752: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x0 host 1
>  qemu-system-x86-656   [001] ....  1313.087752: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 1

Initialization (host == 1)

>  qemu-system-x86-656   [001] ....  1313.156675: kvm_msr: msr_read 40000083 = 0x0
>  qemu-system-x86-656   [001] ....  1313.156680: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x7fffe001 host 0
> Choose Windows DebugEntry

I guess this is a bit misplaced timewise, BIOS is still working here

>  qemu-system-x86-656   [001] ....  1313.156680: kvm_msr: msr_write 40000083 = 0x7fffe001

BIOS sets up message page

>  qemu-system-x86-656   [001] ....  1313.162111: kvm_msr: msr_read 40000080 = 0x0
>  qemu-system-x86-656   [001] ....  1313.162118: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000080 data 0x1 host 0
>  qemu-system-x86-656   [001] ....  1313.162119: kvm_msr: msr_write 40000080 = 0x1

BIOS activates SCONTROL

>  qemu-system-x86-656   [001] ....  1313.246758: kvm_msr: msr_read 40000083 = 0x7fffe001
>  qemu-system-x86-656   [001] ....  1313.246764: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0x0 host 0
>  qemu-system-x86-656   [001] ....  1313.246764: kvm_msr: msr_write 40000083 = 0x0

BIOS clears message page (it's not needed once the VMBus device was
brought up)

I guess the choice of Windows DebugEntry appeared somewhere here.

>  qemu-system-x86-656   [001] ....  1348.904727: kvm_msr: msr_read 40000083 = 0x0
>  qemu-system-x86-656   [001] ....  1348.904771: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1001 host 0
>  qemu-system-x86-656   [001] ....  1348.904772: kvm_msr: msr_write 40000083 = 0xfa1001

Bootloader (debug stub?) sets up the message page

>  qemu-system-x86-656   [001] ....  1348.919170: kvm_msr: msr_read 40000083 = 0xfa1001
>  qemu-system-x86-656   [001] ....  1348.919183: kvm_hv_synic_set_msr: vcpu_id 0 msr 0x40000083 data 0xfa1000 host 0
>  qemu-system-x86-656   [001] ....  1348.919183: kvm_msr: msr_write 40000083 = 0xfa1000

Message page is being disabled again.

I guess you only filtered SCONTROL and SIMP, skipping e.g. SVERSION,
GUEST_OS_ID, HYPERCALL, etc., which are also part of the exchange here.

>  I could not get the EDK2 setup to work though
>  (https://src.openvz.org/projects/UP/repos/edk2 branch hv-scsi)
> 
> It does not detect my VM hard drive not sure why (this is how i  configured
> it:
>  -drive file=./win10.qcow2,format=qcow2,if=none,id=drive_disk0 \
>  -device virtio-blk-pci,drive=drive_disk0 \
> 
> (Is there something special i need to configure it order for it to  work?, I
> tried building EDK2 with and without SMM_REQUIRE and  SECURE_BOOT_ENABLE)

No special configuration I can think of.

> But in general it sounds like there is something I dont fully understand
> when SCONTROL is enabled, then a GSI is associated with this SintRoute.
> 
> Then when the guest triggers an EOI via the APIC we will trigger the GSI
> notification, which will give us another go on trying to copy the message
> into it's slot.

Right.

> So is it the OS that is in charge of setting the EOI?

Yes.

> If so then it needs to
> be aware of SCONTROL being enabled and just having it left set by the BIOS
> might not be enough?

Yes it needs to be aware of SCONTROL being enabled.  However, this
awareness may be based on a pure assumption that the previous entity
(BIOS or bootloader) did it already.

> Also in the TLFS (looking at v6) they mention that message queueing has "3
> exit conditions", which will cause the hypervisor to try and attempt to
> deliver the additional messages.
> 
> The 3 exit conditions they refer to are:
> * Another message buffer is queued.
> * The guest indicates the “end of interrupt” by writing to the APIC’s   EOI
> register.
> * The guest indicates the “end of message” by writing to the SynIC’s EOM
> register.
> 
> Also notice this additional exit is only if there is a pending message and
> not for every EOM.

This meaning of "exit" doesn't trivially correspond to what we have in
KVM.  A write to an msr does cause a vmexit.  Then KVM notifies resample
eventfds for all SINTs that have them set up, no matter if there's a
pending message in the slot.  It may be slightly more optimal to only
notify those having indicated a pending message, but I don't see the
current behavior break anything or violate the spec, so, as EOMs are not
used on fast paths, I woudn't bother optimizing.

Roman.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux