Re: [PATCH] kvmclock: Ensure time in migration never goes backward

Nick Thomas <nick@xxxxxxxxxxxxxx> · Wed, 07 May 2014 11:04:55 +0100

Hi all,

On 06/05/14 08:16, Alexander Graf wrote:
> 
> On 06.05.14 01:23, Marcelo Tosatti wrote:
>
>> 1) By what algorithm you retrieve
>> and compare time in kvmclock guest structure and KVM_GET_CLOCK.
>> What are the results of the comparison.
>> And whether and backwards time was visible in the guest.
> 
> I've managed to get my hands on a broken migration stream from Nick.
> There I looked at the curr_clocksource structure and saw that the last
> seen time on the kvmclock clock source was greater than the value that
> the kvmclock device migrated.

We've been seeing live migration failures where the guest sees time go
backwards (= massive forward leap to the kernel, apparently)  for a
while now, affecting perhaps 5-10% of migrations we'd do (usually a
large proportion of the migrations on a few hosts, rather than an even
spread); initially in December, when we tried an upgrade to QEMU 1.7.1
and a 3.mumble (3.10?) kernel, from 1.5.0 and Debian's 3.2.

My testing at the time seemed to indicate that either upgrade - qemu or
kernel - caused the problems to show up. Guest symptoms are that the
kernel enters a tight loop in __run_timers and stays there. In the end,
I gave up and downgraded us again without any clear idea of what was
happening, or why.

In April, we finally got together a fairly reliable test case. This
patch resolves the guest hangs in that test, and I've also been able to
conduct > 1000 migrations of production guests without seeing the issue
recur. So,

Tested-by: Nick Thomas <nick@xxxxxxxxxxxxxx>

/Nick

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html