On Sun, May 13, 2012 at 06:13:22PM +0300, Michael S. Tsirkin wrote: > Document the new EOI MSR. > > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > --- > > This documents my PV EOI patchset and applies on top. > Will make it part of the patchset on the next respin. > > Documentation/virtual/kvm/msr.txt | 56 +++++++++++++++++++++++++++++++++++++ > 1 files changed, 56 insertions(+), 0 deletions(-) > > diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt > index 5031780..bdbd337 100644 > --- a/Documentation/virtual/kvm/msr.txt > +++ b/Documentation/virtual/kvm/msr.txt > @@ -219,3 +219,59 @@ MSR_KVM_STEAL_TIME: 0x4b564d03 > steal: the amount of time in which this vCPU did not run, in > nanoseconds. Time during which the vcpu is idle, will not be > reported as steal time. > + > +MSR_KVM_EOI_EN: 0x4b564d04 > + data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0 > + when disabled. When enabled, bits 63-1 hold 2-byte aligned physical address > + of a 2 byte memory area which must be in guest RAM and must be zeroed. > + > + The first, least significant bit of 2 byte memory location will be > + written to by the hypervisor, typically at the time of interrupt > + injection. Value of 1 means that guest can skip writing EOI to the apic > + (using MSR or MMIO write); instead, it is sufficient to signal > + EOI by clearing the bit in guest memory - this location will > + later be polled by the hypervisor. > + Value of 0 means that the EOI write is required. > + > + It is always safe for the guest to ignore the optimization and perform > + the APIC EOI write anyway. > + > + Hypervisor is guaranteed to only modify this least > + significant bit while in the current VCPU context, this means that > + guest does not need to use either lock prefix or memory ordering > + primitives to synchronise with the hypervisor. > + > + However, hypervisor can set and clear this memory bit at any time: > + therefore to make sure hypervisor does not interrupt the > + guest and clear the least significant bit in the memory area > + in the window between guest testing it to detect > + whether it can skip EOI apic write and between guest > + clearing it to signal EOI to the hypervisor, > + guest must both read the least sgnificant bit in the memory area and > + clear it using a single CPU instruction, such as test and clear, or > + compare and exchange. > + Looks good, but everything below this is here by mistake. Are You still going to resend host side patch to address my other comment? > +the page referred to by the page fault is not > + present. Value 2 means that the page is now available. Disabling > + interrupt inhibits APFs. Guest must not enable interrupt > + before the reason is read, or it may be overwritten by another > + APF. Since APF uses the same exception vector as regular page > + fault guest must reset the reason to 0 before it does > + something that can generate normal page fault. If during page > + fault APF reason is 0 it means that this is regular page > + fault. > + > + During delivery of type 1 APF cr2 contains a token that will > + be used to notify a guest when missing page becomes > + available. When page becomes available type 2 APF is sent with > + cr2 set to the token associated with the page. There is special > + kind of token 0xffffffff which tells vcpu that it should wake > + up all processes waiting for APFs and no individual type 2 APFs > + will be sent. > + > + If APF is disabled while there are outstanding APFs, they will > + not be delivered. > + > + Currently type 2 APF will be always delivered on the same vcpu as > + type 1 was, but guest should not rely on that. > + > -- > MST -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html