Signed-off-by: Nikunj A. Dadhania <nikunj@xxxxxxxxxxxxxxxxxx> --- Documentation/virtual/kvm/msr.txt | 4 ++ Documentation/virtual/kvm/paravirt-tlb-flush.txt | 53 ++++++++++++++++++++++ 2 files changed, 57 insertions(+), 0 deletions(-) create mode 100644 Documentation/virtual/kvm/paravirt-tlb-flush.txt diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt index 7304710..92a6af6 100644 --- a/Documentation/virtual/kvm/msr.txt +++ b/Documentation/virtual/kvm/msr.txt @@ -256,3 +256,7 @@ MSR_KVM_EOI_EN: 0x4b564d04 guest must both read the least significant bit in the memory area and clear it using a single CPU instruction, such as test and clear, or compare and exchange. + +MSR_KVM_VCPU_STATE: 0x4b564d05 + +Refer: Documentation/virtual/kvm/paravirt-tlb-flush.txt diff --git a/Documentation/virtual/kvm/paravirt-tlb-flush.txt b/Documentation/virtual/kvm/paravirt-tlb-flush.txt new file mode 100644 index 0000000..0eaabd7 --- /dev/null +++ b/Documentation/virtual/kvm/paravirt-tlb-flush.txt @@ -0,0 +1,53 @@ +KVM - Paravirt TLB Flush +Nikunj A Dadhania <nikunj@xxxxxxxxxxxxxxxxxx>, IBM, 2012 +======================================================== + +Remote flushing api's does a busy wait which is fine in bare-metal +scenario. But with-in the guest, the vcpus might have been pre-empted +or blocked. In this scenario, the initator vcpu would end up +busy-waiting for a long amount of time. + +This would require to have information of guest running/not-running +within the guest to take a decision. The following MSR introduces vcpu +running state information. + +Using this MSR we have implemented para-virt flush tlbs making sure +that it does not wait for vcpus that are not-running. And TLB flushing +for them is deferred, which is done on guest enter. + +MSR_KVM_VCPU_STATE: 0x4b564d04 + + data: 64-byte alignment physical address of a memory area which must be + in guest RAM, plus an enable bit in bit 0. This memory is expected to + hold a copy of the following structure: + + struct kvm_steal_time { + __u64 state; + __u32 pad[14]; + } + + whose data will be filled in by the hypervisor/guest. Only one + write, or registration, is needed for each VCPU. The interval + between updates of this structure is arbitrary and + implementation-dependent. The hypervisor may update this + structure at any time it sees fit until anything with bit0 == + 0 is written to it. Guest is required to make sure this + structure is initialized to zero. + + This would enable a VCPU to know running status of sibling + VCPUs. The information can further be used to determine if an + IPI needs to be send to the non-running VCPU and wait for them + unnecessarily. For e.g. flush_tlb_others_ipi. + + Fields have the following meanings: + + state: has bit following fields: + + Bit 0 - vcpu running state. Hypervisor would set vcpu + running/not running. Value 1 meaning the vcpu + is running and value 0 means vcpu is + pre-empted out. + + Bit 1 - hypervisor should flush tlb is set during + guest enter/exit + -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html