On 04/29/2011 01:40 AM, Joerg Roedel wrote:
On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
On 04/28/2011 01:20 PM, Joerg Roedel wrote:
This code checks how many guest tsc cycles have passed since this vCPU
was de-scheduled last time (and before it is running again). So since
the vCPU hasn't run in the meantime it had no chance to change its TSC.
Further, the other parameters like the TSC offset and the scaling
multiplier havn't changed too, so the only variable in the guest-tsc
calculation is the host-tsc.
So this calculation using the guest-tsc can detect backwards going
host-tsc as good as the old one. The benefit here is that we can feed
consistent values into adjust_tsc_offset().
While true, this is more complex than the original code. The original
code here doesn't try to actually compensate for the guest TSC
difference - instead what it does is NULL any discovered host TSC delta:
Why should KVM care about the host-tsc going backwards when the
guest-tsc moves forward? I think the bottom-line of this code is to make
sure the guest-tsc does not jump around (or even jumps backwards).
I also don't agree that this is additional complexity. With these
changes we were able to get rid of the last_host_tsc variable which is
actually a simplification.
As I see it, there are two situations here:
1) On hosts without tsc-scaling the value of tsc_delta calculated from
the guest-tsc values is the same as when calculated with host-tsc
values, because the guest-tsc only differs by an offset from the
host-tsc.
2) On hosts with tsc-scaling these two tsc_delta values may be
different. If the guest has a different tsc-freq as the host, we
can't simply adjust the tsc by an offset calculated from the host-tsc
values. So tsc_delta has to be calculated using the guest-tsc.
if (tsc_delta< 0)
mark_tsc_unstable("KVM discovered backwards TSC");
if (check_tsc_unstable()) {
kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
vcpu->arch.tsc_catchup = 1;
}
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
Erasing that delta also erases elapsed time since the VCPU has last been
run, which isn't desirable, so it then sets tsc_catchup mode, which will
restore the proper TSC. The request here triggers code which later
updates the TSC offset again.
I just looked again at the tsc_catchup thing. Since the clock-update is
forced in the code above, and the clock-update-code adjusts the tsc
itself if necessary, is it really necessary to do this here at all?
Unfortunately, yes, it is. The code in the second or catchup phase can
only correct an under adjusted clock, or we risk setting the guest clock
backwards from what has been observed. So to guarantee there is not a
huge jump due to over-adjustment, we must eliminate the TSC delta when
switching CPUs, and that needs to happen in the preempt notifier, not
the clock update handler.
I agree it is simpler to do everything in terms of guest clock rate, but
there is one variable which is thrown in to complicate things here - the
host clock rate may have changed during this whole process.
Let me look at the code again and see if it is possible to get rid of
the first, host based adjustment entirely. Ideally we would just track
things in terms of the guest clock and do all the computation inside the
clock update request handler instead of having one adjustment in the
preempt notifier and a separate one later when the clock update happens.
I had originally tried something like that but ran into issues where the
rate computation got exceedingly difficult. Maybe the code has been
restructured enough now that it will work (the clock update didn't used
to happen unless you had KVM clock...)
Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html