Re: qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

Oliver Francke <Oliver.Francke@xxxxxxxx> · Sun, 4 Aug 2013 15:36:52 +0200

Hi Mike,

you might be the guy StefanHa was referring to on the qemu-devel mailing-list.

I just made some more tests, so…

Am 02.08.2013 um 23:47 schrieb Mike Dawson <mike.dawson@xxxxxxxxxxxx>:

> Oliver,
> 
> We've had a similar situation occur. For about three months, we've run several Windows 2008 R2 guests with virtio drivers that record video surveillance. We have long suffered an issue where the guest appears to hang indefinitely (or until we intervene). For the sake of this conversation, we call this state "wedged", because it appears something (rbd, qemu, virtio, etc) gets stuck on a deadlock. When a guest gets wedged, we see the following:
> 
> - the guest will not respond to pings

If showing up the hung_task - message, I can ping and establish new ssh-sessions, just the session with a while loop does not accept any keyboard-action.

> - the qemu-system-x86_64 process drops to 0% cpu
> - graphite graphs show the interface traffic dropping to 0bps
> - the guest will stay wedged forever (or until we intervene)
> - strace of qemu-system-x86_64 shows QEMU is making progress [1][2]
> 

nothing special here:

5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=6, events=POLLIN}, {fd=19, events=POLLIN}, {fd=15, events=POLLIN}, {fd=4, events=POLLIN}], 11, -1) = 1 ([{fd=12, revents=POLLIN}])
[pid 11793] read(5, 0x7fff16b61f00, 16) = -1 EAGAIN (Resource temporarily unavailable)
[pid 11793] read(12, "\2\0\0\0\0\0\0\0\0\0\0\0\0\361p\0\252\340\374\373\373!gH\10\0E\0\0Yq\374"..., 69632) = 115
[pid 11793] read(12, 0x7f0c1737fcec, 69632) = -1 EAGAIN (Resource temporarily unavailable)
[pid 11793] poll([{fd=27, events=POLLIN|POLLERR|POLLHUP}, {fd=26, events=POLLIN|POLLERR|POLLHUP}, {fd=24, events=POLLIN|POLLERR|POLLHUP}, {fd=12, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=

and that for many, many threads.
Inside the VM I see 75% wait, but I can restart the spew-test in a second session.

All that tested with rbd_cache=false,cache=none.

I also test every qemu-version with a 2 CPU 2GiB mem Windows 7 VM with some high load, encountering no problem ATM. Running smooth and fast.

> We can "un-wedge" the guest by opening a NoVNC session or running a 'virsh screenshot' command. After that, the guest resumes and runs as expected. At that point we can examine the guest. Each time we'll see:
> 
> - No Windows error logs whatsoever while the guest is wedged
> - A time sync typically occurs right after the guest gets un-wedged
> - Scheduled tasks do not run while wedged
> - Windows error logs do not show any evidence of suspend, sleep, etc
> 
> We had so many issue with guests becoming wedged, we wrote a script to 'virsh screenshot' them via cron. Then we installed some updates and had a month or so of higher stability (wedging happened maybe 1/10th as often). Until today we couldn't figure out why.
> 
> Yesterday, I realized qemu was starting the instances without specifying cache=writeback. We corrected that, and let them run overnight. With RBD writeback re-enabled, wedging came back as often as we had seen in the past. I've counted ~40 occurrences in the past 12-hour period. So I feel like writeback caching in RBD certainly makes the deadlock more likely to occur.
> 
> Joshd asked us to gather RBD client logs:
> 
> "joshd> it could very well be the writeback cache not doing a callback at some point - if you could gather logs of a vm getting stuck with debug rbd = 20, debug ms = 1, and debug objectcacher = 30 that would be great"
> 
> We'll do that over the weekend. If you could as well, we'd love the help!
> 
> [1] http://www.gammacode.com/kvm/wedged-with-timestamps.txt
> [2] http://www.gammacode.com/kvm/not-wedged.txt
> 

As I wrote above, no cache so far, so omitting the verbose debugging at the moment. But will do if requested.

Thnx for your report,

Oliver.

> Thanks,
> 
> Mike Dawson
> Co-Founder & Director of Cloud Architecture
> Cloudapt LLC
> 6330 East 75th Street, Suite 170
> Indianapolis, IN 46250
> 
> On 8/2/2013 6:22 AM, Oliver Francke wrote:
>> Well,
>> 
>> I believe, I'm the winner of buzzwords-bingo for today.
>> 
>> But seriously speaking... as I don't have this particular problem with
>> qcow2 with kernel 3.2 nor qemu-1.2.2 nor newer kernels, I hope I'm not
>> alone here?
>> We have a raising number of tickets from people reinstalling from ISO's
>> with 3.2-kernel.
>> 
>> Fast fallback is to start all VM's with qemu-1.2.2, but we then lose
>> some features ala latency-free-RBD-cache ;)
>> 
>> I just opened a bug for qemu per:
>> 
>> https://bugs.launchpad.net/qemu/+bug/1207686
>> 
>> with all dirty details.
>> 
>> Installing a backport-kernel 3.9.x or upgrade Ubuntu-kernel to 3.8.x
>> "fixes" it. So we have a bad combination for all distros with 3.2-kernel
>> and rbd as storage-backend, I assume.
>> 
>> Any similar findings?
>> Any idea of tracing/debugging ( Josh? ;) ) very welcome,
>> 
>> Oliver.
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com