Re: What's kvmclock's custom sched_clock for?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 7, 2016 at 2:56 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
> On Wed, Jan 06, 2016 at 11:18:51PM -0800, Andy Lutomirski wrote:
>> AFAICT KVM reliably passes a monotonic TSC through to guests,
>
> It does not.

Under what circumstances does it go backwards?  All hosts support tsc
offsets, I think, and the host code knows how to prevent the clock
from going backwards even on host suspend.

Does migration make the TSC go backwards?  If so, that's impolite and
it would be nice to fix it.

On Thu, Jan 7, 2016 at 7:18 AM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote:
> 2016-01-07 00:41-0800, Andy Lutomirski:
>> On Wed, Jan 6, 2016 at 11:18 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> AFAICT KVM reliably passes a monotonic TSC through to guests, even if
>>> the host suspends.  That's all that sched_clock needs, I think.
>>>
>>> So why does kvmclock have a custom sched_clock?
>
> If the host CPU has enough features, then yes, KVM can take care of
> everything and kvmclock has no advantage over TSC, even when migrating
> to TSC with different frequency as modern CPUs support TSC offset +
> scaling in guests.
>
> The problem is with antiques.  Guests on old CPUs need to have more
> information on top of TSC to be able to get useful system time.
> And old KVM doesn't provide good information, so we have legacy layers
> everywhere.
>
> kvmclock in the guest can just equal to rdtsc() with modern CPUs, but we
> still want to use kvmclock wrapper, because kvmclock can provide an
> stable clock regardless of underlying TSC (in theory).

OK, makes sense.

>
>>> On a related note, KVM doesn't pass the "invariant TSC" feature
>>> through to guests on my machine even though "invtsc" is set in QEMU
>>> and the kernel host code appears to support it.  What gives?
>>
>> I think I solved part of the puzzle.  KVM doesn't like to advertise
>> invtsc by default because that breaks migration.  (Oddly, the end
>> result seems wrong -- with migration, the TSC doesn't stop, but it's
>> not constant, and X86_FEATURE_CONSTANT_TSC is nonetheless set, but
>> whatever.)
>
> QEMU probably missed that because X86_FEATURE_CONSTANT_TSC is a function
> of family/model.  (CONSTANT_TSC is the same as invariant TSC as KVM
> guests don't have c-states.)
>
>>             So the scheduler clock doesn't get marked stable.
>
> Stable sched clock is quite unrelated to TSC features.  KVMs from last
> few years should always give good enough result to allow stable sched
> clock.  We wanted realtime guests and realtime linux needs no_hz=full
> that depends on stable sched clock.  The result is huge hack.
>
> We'd need to say that migration creates powerful gravity fields to
> faithfully migrate constant/invariant TSC, but stable sched clock
> doesn't have that strict expectations about time.
>
>> Is that it?
>>
>> This still doesn't explain why even explicitly trying to set invtsc
>> doesn't seem to work.
>
> Seems like a bug.  Mine cpuid is
>    0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
> and QEMU says
>   warning: host doesn't support requested feature: CPUID.80000007H:EDX.invtsc [bit 8]
>
> I'll see if it's in KVM or QEMU.  (We should only forbid migrations to
> hosts with different frequency and without guest TSC scaling.)

If I do -cpu host,migratable=off,+invtsc, then it works.  Maybe QEMU
is just being too strict.  This is Skylake.

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux