Re: [Qemu-devel] Slow qemu guest performance after host is resumed from suspended state

Ruslan Habalov <3v0n1d3@xxxxxxxxx> · Wed, 14 Dec 2016 23:52:23 +0100

Friendly ping.

On Wed, Dec 7, 2016 at 9:53 PM, Ruslan Habalov <3v0n1d3@xxxxxxxxx> wrote:
> Following Stefan's suggestion I've investigated the possibility of potential
> problems with interrupts.
> Unfortunately, I can see no differences between both scenarios regarding the
> interrupt counter growth (c.f.
> https://evonide.com/images/qemu_proc_interrupts.png).
> After a short conversation with him I think we can exclude this as a problem
> source.
>
> Please let me know if you have any further advice how I should proceed
> further in order to triage this bug.
>
> On Wed, Dec 7, 2016 at 7:49 PM, Stefan Hajnoczi <stefanha@xxxxxxxxx> wrote:
>>
>> On Sun, Dec 4, 2016 at 10:52 PM, Ruslan Habalov <3v0n1d3@xxxxxxxxx> wrote:
>> > Hi everyone,
>> >
>> > I'm having the following qemu issue since about February on Xubuntu.
>> >
>> > Steps to reproduce:
>> > 1) Boot host and boot qemu guest - everything works perfect
>> > 2) Shutdown qemu guest
>> > 3) Suspend host and resume host
>> > 4) Boot qemu guest again - the guest is now extremely slow
>> >
>> > Current system setup:
>> > host: kernel 4.8.0-27-generic (also reproducible with older kernel
>> > versions)
>> > qemu guest: Linux or Windows affected (tested both)
>> >
>> > I've tried to debug the issue on my own with the following results:
>> >
>> > ------------------------------------------------------------------------------------------------------------------------------
>> > https://www.evonide.com/images/qemu_problem.png
>> > - This shows "perf top $QEMU_PID" from guest boot to shutdown for three
>> > different scenarios:
>> >   Left: normal guest behavior after fresh boot of host system
>> >   Middle: very slow guest after host was suspended and resumed
>> >   Right: test run with disabled HPET (since I initially assumed some
>> > problems with HPET)
>> >
>> > https://www.evonide.com/images/qemu_hpet_investigation.png
>> > - Results of "perf record -g" and "perf report"
>> >   According to stefanha (qemu IRC) this doesn't seem to indicate any
>> > error
>> > sources.
>> >
>> > https://www.evonide.com/images/qemu_vm_exit_healthy_vs_unhealthy.png
>> > - Results of running: "perf record -a -e kvm:kvm_exit" and "perf
>> > report" from guest boot to shutdown
>> >   - Left: normal situation (after fresh host boot)
>> >   - Right: bad situation    (after suspend and resume on host and boot
>> > of
>> > qemu guest)
>> >   There were ~300 out of order events reported by perf for the "bad"
>> > scenario (vs. only 2 for the "good" scenario).
>> >
>> > ------------------------------------------------------------------------------------------------------------------------------
>> > I was also recommended to check the host timers (e.g. /proc/timer_list).
>> > However, this file has more than 800 lines and I am unsure what to look
>> > for.
>> >
>> > Please let me know if you need further information on this issue.
>>
>> Unfortunately I don't see an obvious pattern in the kvm_exit reasons.
>> Maybe someone on kvm@xxxxxxxxxxxxxxx can help, I have CCed the list.
>>
>> One idea that might explain the read_hpet trace you collected is an
>> interrupt storm.  That means an interrupt is firing all the time
>> without doing useful work.  This could explain why the vcpu keeps
>> halting (leading to many ktime_get() calls on the host).
>>
>> You can check the interrupt storm theory by collecting
>> /proc/interrupts on the host and inside the guest.  If you compare the
>> healthy system against the slowed system there might be an obvious
>> difference in the interrupt counters.  That would allow you to track
>> down which device keeps firing an interrupt.
>>
>> Stefan
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html