Re: [RFC] arm/cpu: fix soft lockup panic after resuming from stop

Steven Price <steven.price@xxxxxxx> · Thu, 11 Apr 2019 16:31:39 +0100

On 11/04/2019 08:27, Heyi Guo wrote:
> Hi Steve,
> 
> After reading kernel code about time keeping and something related, I've
> not got a clear picture of how we can use MSR_KVM_WALL_CLOCK_NEW to keep
> wall clock in guest VM.
> 
> 1. On X86, MSR_KVM_WALL_CLOCK_NEW is only used by the callback of system
> suspend and resume; I didn't find it used for runtime wall clock reading.

The MSR is only to provide an offset from the guest's view of system
time to the actual wall-clock time. Like you mention it's used on resume
to resynchronise with wall-clock time.

Looking at this more carefully, I think I may have got a bit confused
with how this works on x86. As you say MSR_KVM_WALL_CLOCK_NEW is only
used on boot (and suspend/resume) to provide the difference between the
TSC and wall-clock.

MSR_KVM_SYSTEM_TIME_NEW seems to be the MSR that provides the offset
between the host's TSC and the guest's view of time. In particular the
KVM request KVM_REQ_CLOCK_UPDATE causes this to be updated.

There's a lot of code in there to deal with e.g. the host's TSC going
backwards (due to buggy hardware, but also due to suspend/hibernate).

> 2. To use the MSR for wall clock synchronization, shall we register KVM
> PV-clock as a higher rating clock source, so that it will be bound to
> tk_core.timekeeper and be read at each time of running
> update_wall_time() in each timer tick?
>
> 3. If the above is true, how can we keep the paravirtualized wall clock
> always updated? Is it always trapped to the hypervisor? I'm afraid this
> may cause performance loss. If there is no trap and the data is updated
> by the hypervisor periodically, how can we guarantee the accuracy?

We obviously don't want to be making frequent hypercalls (or other
traps) - so the idea is to provide the guest with a structure which is
updated when the host is aware something has changed. The guest then
reads this structure (using a version field to avoid races with the
host) and uses it compute it's own version of time.

> Meanwhile it seems easier to use KVM_KVMCLOCK_CTRL to get rid of false
> positive soft lock panic, and guest can rely on cntvct for wall clock
> updating as it does now, and it seems not difficult for the hypervisor
> to keep cntvct "always on" and "monotonic".
> 
> Please let me know if I miss something.

Yes I'm coming around to that way of thinking - as long as the
hypervisor provides the "always on"/"monotonic" properties then really
the only missing bit is informing the guest when it was paused so it can
"do the right thing". This does appear to be what the x86 code is doing,
but I have to admit I struggle to fully understand it.

Steve

> Thanks,
> Heyi
> 
> On 2019/3/27 1:12, Steven Price wrote:
>> Hi Heyi,
>>
>> On 26/03/2019 13:53, Heyi Guo wrote:
>>> I also tested save/restore operations, and observed that clock in guest
>>> would not jump after restoring either. If we consider guest clock not
>>> being synchronized with real wall clock as an issue, does it mean
>>> save/restore function has the same issue?
>> Basically at the moment when the guest isn't running you have a choice
>> of two behaviours:
>>
>>   1. Stop (i.e. save/restore) CNTVCT - this means that the guest sees no
>> time occur. If the guest needs to have a concept of wall-clock time
>> (e.g. it communicates with other systems over a network) then this can
>> cause problems (e.g. timeouts might be wrong, certificates might start
>> appearing to be in the future etc).
>>
>>   2. Leave CNTVCT running - the guest sees the time pass but interprets
>> the vCPUs as effectively having locked up. Linux will trigger the soft
>> lockup warning.
>>
>> There are two ways of solving this, which match the two behaviours above:
>>
>>   1. Provide the guest with a view of wall-clock time. The obvious way of
>> doing this is with a pvclock implementation like MSR_KVM_WALL_CLOCK_NEW
>> for x86.
>>
>>   2. Inform the guest to ignore the apparent "soft-lockup". There's
>> already an ioctl for x86 for this: KVM_KVMCLOCK_CTRL
>>
>> My preference is for option 1 - as this gives the guest a good view of
>> both the time that it is actually executing (useful for internal
>> watchdog timers like the soft-lockup one in Linux) and maintains a view
>> of wall-clock time (useful when communicating with other external
>> services - e.g. the a server on the internet). Your patch to QEMU
>> provides the first step of that, but as you mention there's much more
>> to do.
>>
>> One thing I haven't investigated in great detail is how KVM handles the
>> timer during various forms of suspend. In particular for suspend types
>> like full hibernation the host's physical counter will jump (quite
>> possibly backwards) - I haven't looked in detail how KVM presents this
>> to the guest. Hopefully not by making it go backwards!
>>
>> I'm not sure how much time I'm going to have to look at this in the near
>> future, but please keep me in the loop if you decide to tackle any of
>> this.
>>
>> Thanks,
>>
>> Steve
>>
>> .
>>
> 
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@xxxxxxxxxxxxxxxxxxxxx
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm