Re: [PATCH 5/6] KVM: X86: Delegate tsc-offset calculation to architecture code

Zachary Amsden <zamsden@xxxxxxxxxx> · Fri, 11 Feb 2011 17:12:29 -0500

On 02/09/2011 12:29 PM, Joerg Roedel wrote:
With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.

Signed-off-by: Joerg Roedel<joerg.roedel@xxxxxxx>
---
  arch/x86/include/asm/kvm_host.h |    1 +
  arch/x86/kvm/svm.c              |   11 ++++++++++-
  arch/x86/kvm/vmx.c              |    6 ++++++
  arch/x86/kvm/x86.c              |   10 +++++-----
  4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9686950..8c40425 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -593,6 +593,7 @@ struct kvm_x86_ops {
  	void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);

  	bool (*use_virtual_tsc_khz)(struct kvm_vcpu *vcpu);
+	u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
   

So I've gone over this series and the only issue I see so far is with 
this patch, and it doesn't have to do with upstream code, rather with 
code I was about to send.

Logically, the compensation done by adjust_tsc_offset should also be 
scaled; currently, this happens only for reasons, both of which are 
meant to deal with unstable TSCs; since TSC scaling won't happen on 
those processors with unstable TSCs, we don't need to worry about it there.

I have an upcoming patch which does complicate things a bit, which deals 
with host suspend.  In that case, the host TSC goes backwards and the 
offsets needs to be recomputed.  However, there is no convenient time to 
set the offset again; on VMX, the hardware will not yet be set up and so 
can't easily write the offset field in the VMCS.  We also can't put a 
synchronization barrier on all the VCPUs to write the offset before they 
start running without getting into a difficult situation with locking.

So instead, the approach I took was to re-use the adjust_tsc_offset 
function and accumulate offsets to apply.

For SVM with TSC scaling, the offset to apply as an adjustment in this 
case needs to be scaled.  Setting guest TSC (gtsc) equal to the new 
guest TSC (gstc'), we have:

gtsc = htsc * mult + offset
gstc' = htsc' * mult + offset'
gtsc' = gtsc
offset' = htsc * mult + offset - htsc' * mult
offset' = (htsc - htsc') * mult + offset

so, delta offset needs to = (htsc - htsc') * mult

We will instead be passing (htsc - htsc') as the adjustment value; the 
solution seems simple, we have to scale it up as well in the 
adjust_tsc_offset function.

However, the problem is that we need a new architecture specific 
function or API change because not all call sites for adjust_tsc want to 
have adjustments scaled - the call site dealing with tsc_catchup is 
actually working in guest cycles already, so should not be scaled again.

We could have separate functions to adjust TSC cycles by either guest or 
host cycle amounts, or pass a flag to adjust_tsc_offset indicating 
whether the adjustment is to be applied in guest cycles or host cycles.

The resulting API will be slightly asymmetric, as compute_tsc_offset 
lets the generic code compute in terms of hardware offsets, but in the 
adjustment case, there isn't an easy way to expose the ability to 
compute in hardware offset terms.

One slight pity is that we won't be able to resue 
svm_compute_tsc_offset, as the applied delta won't be based off a read 
of the tsc.  I can't really find a better API though, in case offsets 
are computed differently on different hardware (such as multiplying 
after the offset), then we need a function to convert guest cycles back 
to hardware cycles.

As usual, with the TSC code, it is going to require a lot of commenting 
to explain this.

Your code in general looks good.

Cheers,

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html