Re: Bug in KVM clock backwards compensation

Joerg Roedel <joro@xxxxxxxxxx> · Fri, 29 Apr 2011 10:40:54 +0200

On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
> On 04/28/2011 01:20 PM, Joerg Roedel wrote:

>> This code checks how many guest tsc cycles have passed since this vCPU
>> was de-scheduled last time (and before it is running again). So since
>> the vCPU hasn't run in the meantime it had no chance to change its TSC.
>> Further, the other parameters like the TSC offset and the scaling
>> multiplier havn't changed too, so the only variable in the guest-tsc
>> calculation is the host-tsc.
>> So this calculation using the guest-tsc can detect backwards going
>> host-tsc as good as the old one. The benefit here is that we can feed
>> consistent values into adjust_tsc_offset().
>>    
>
> While true, this is more complex than the original code.  The original  
> code here doesn't try to actually compensate for the guest TSC  
> difference - instead what it does is NULL any discovered host TSC delta:

Why should KVM care about the host-tsc going backwards when the
guest-tsc moves forward? I think the bottom-line of this code is to make
sure the guest-tsc does not jump around (or even jumps backwards).

I also don't agree that this is additional complexity. With these
changes we were able to get rid of the last_host_tsc variable which is
actually a simplification.

As I see it, there are two situations here:

1) On hosts without tsc-scaling the value of tsc_delta calculated from
   the guest-tsc values is the same as when calculated with host-tsc
   values, because the guest-tsc only differs by an offset from the
   host-tsc.

2) On hosts with tsc-scaling these two tsc_delta values may be
   different. If the guest has a different tsc-freq as the host, we
   can't simply adjust the tsc by an offset calculated from the host-tsc
   values. So tsc_delta has to be calculated using the guest-tsc.
>
>         if (tsc_delta < 0)
>             mark_tsc_unstable("KVM discovered backwards TSC");
>         if (check_tsc_unstable()) {
>             kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
>             vcpu->arch.tsc_catchup = 1;
>         }
>         kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>
> Erasing that delta also erases elapsed time since the VCPU has last been  
> run, which isn't desirable, so it then sets tsc_catchup mode, which will  
> restore the proper TSC.  The request here triggers code which later  
> updates the TSC offset again.

I just looked again at the tsc_catchup thing. Since the clock-update is
forced in the code above, and the clock-update-code adjusts the tsc
itself if necessary, is it really necessary to do this here at all?

> I left compute_guest_tsc in place to recompute time in guest units here,  
> even if the underlying hardware rate changes.
>
>         /*
>          * We may have to catch up the TSC to match elapsed wall clock
>          * time for two reasons, even if kvmclock is used.
>          *   1) CPU could have been running below the maximum TSC rate
>          *   2) Broken TSC compensation resets the base at each VCPU
>          *      entry to avoid unknown leaps of TSC even when running
>          *      again on the same CPU.  This may cause apparent elapsed
>          *      time to disappear, and the guest to stand still or run
>          *      very slowly.
>          */
>         if (vcpu->tsc_catchup) {
>                 u64 tsc = compute_guest_tsc(v, kernel_ns);
>                 if (tsc > tsc_timestamp) {
>                         kvm_x86_ops->adjust_tsc_offset(v, tsc -  
> tsc_timestamp);
>                         tsc_timestamp = tsc;
>                 }
>         }
>
> So yeah, the code is getting pretty complex but we'd like to avoid that  
> as much as possible - so I would prefer to have the hardware backwards  
> compensation separate from the guest rate computation by doing this:
>
> step 1) remove any backwards hardware TSC delta (in hardware units)
> step 2) recompute guest TSC from a stable clock (gotten from kernel_ns)  
> and apply adjustment (in guest units)
>
> So it appears you can just share most of the logic of guest TSC catchup  
> mode.
>
> What do you think?

With the compensation done in kvm_guest_time_update I see no reason
anymore to adjust the tsc in vcpu_load at all. So if we can remove the
adjustment from there we can switch back to host-tsc there.

The question still is whether this needs to be done in KVM. For older
kernels Jan will take patches that handle this in kvm-kmod. So this code
can probably be removed from vcpu_load. The benefit is that we don't
need to re-introduce the last_host_tsc.

If we still need to call adjust_tsc_offset() in vcpu_load then we have
to do the calculation with the guest_tsc because this gives us a
tsc_delta in units that adjust_tsc_offset expects.

Regards,

	Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html