On 10/01/2010 04:46 AM, Alexander Graf wrote:
On 01.10.2010, at 13:21, Nadav Har'El wrote:
On Thu, Sep 30, 2010, Zachary Amsden wrote about "Re: TSC in nested SVM and VMX":
1) When reading an MSR, we are not emulating the L2 guest; we are
DIRECTLY reading the MSR for the L1 emulation. Any emulation of the L2
guest is actually done by the code running /inside/ the L1 emulation, so
MSR reads for the L2 guest are handed by L1, and MSR reads for the L1
guest are handled by L0, which is this code.
...
So if we are currently running nested, the L1 tsc_offset is stored in
the nested.hsave field; the vmcb which is active is polluted by the L2
guest offset, which would be incorrect to return to the L1 emulation.
Thanks for the detailed explanation.
It seems, then, that the nested VMX logic is somewhat different from that
of the nested SVM. In nested VMX, if a function gets called when running
L1, the current VMCS will be that of L1 (aka vmcs01), not of its guest L2
(and I'm not even sure *which* L2 that would be when there are multiple
L2 guests for the one L1).
If the #vmexit comes while you're in L1, everything works on the L1's vmcb. If you hit it while in L2, everything works on the L2's vmcb unless special attention is taken.
The reason behind the TSC shift is very simple. With the tsc_offset setting we're trying to adjust the L1's offset. Adjusting the L1's offset means we need to adjust L1 and L2 alike, as the virtual L2's offset == L1 offset + vmcb L2 offset, because L2's TSC is also offset by the amount L1 is.
So basically what happens is:
nested VMRUN:
svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
please note the +=!
svm_write_tsc_offset:
This gets called when we really want to current level's TSC offset only because the guest issued a tsc write. In L2 this means the L2's value.
if (is_nested(svm)) {
g_tsc_offset = svm->vmcb->control.tsc_offset -
svm->nested.hsave->control.tsc_offset;
Remember the difference between L1 and L2.
svm->nested.hsave->control.tsc_offset = offset;
Set L1 to the new offset
}
svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
Set L2 to new offset + delta.
So what this function does is that it treats TSC writes as L1 writes even while in L2 and adjusts L2 accordingly. Joerg, this sounds fishy to me. Are you sure this is intended and works when L1 doesn't intercept MSR writes to TSC?
L1 must intercept MSR writes to TSC for this to work. It does, so all
is well.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html