Hi all, On 06/05/14 08:16, Alexander Graf wrote: > > On 06.05.14 01:23, Marcelo Tosatti wrote: > >> 1) By what algorithm you retrieve >> and compare time in kvmclock guest structure and KVM_GET_CLOCK. >> What are the results of the comparison. >> And whether and backwards time was visible in the guest. > > I've managed to get my hands on a broken migration stream from Nick. > There I looked at the curr_clocksource structure and saw that the last > seen time on the kvmclock clock source was greater than the value that > the kvmclock device migrated. We've been seeing live migration failures where the guest sees time go backwards (= massive forward leap to the kernel, apparently) for a while now, affecting perhaps 5-10% of migrations we'd do (usually a large proportion of the migrations on a few hosts, rather than an even spread); initially in December, when we tried an upgrade to QEMU 1.7.1 and a 3.mumble (3.10?) kernel, from 1.5.0 and Debian's 3.2. My testing at the time seemed to indicate that either upgrade - qemu or kernel - caused the problems to show up. Guest symptoms are that the kernel enters a tight loop in __run_timers and stays there. In the end, I gave up and downgraded us again without any clear idea of what was happening, or why. In April, we finally got together a fairly reliable test case. This patch resolves the guest hangs in that test, and I've also been able to conduct > 1000 migrations of production guests without seeing the issue recur. So, Tested-by: Nick Thomas <nick@xxxxxxxxxxxxxx> /Nick -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html