On Mon, 5 Aug 2013, Mike Dawson wrote: > Josh, > > Logs are uploaded to cephdrop with the file name mikedawson-rbd-qemu-deadlock. > > - At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0 > - At about 2013-08-05 19:53:51, ran a 'virsh screenshot' > > > Environment is: > > - Ceph 0.61.7 (client is co-mingled with three OSDs) > - rbd cache = true and cache=writeback > - qemu 1.4.0 1.4.0+dfsg-1expubuntu4 > - Ubuntu Raring with 3.8.0-25-generic > > This issue is reproducible in my environment, and I'm willing to run any wip > branch you need. What else can I provide to help? This looks like a different issue than Oliver's. I see one anomaly in the log, where a rbd io completion is triggered a second time for no apparent reason. I opened a separate bug http://tracker.ceph.com/issues/5955 and pushed wip-5955 that will hopefully shine some light on the weird behavior I saw. Can you reproduce with this branch and debug objectcacher = 20 debug ms = 1 debug rbd = 20 debug finisher = 20 Thanks! sage > > Thanks, > Mike Dawson > > > On 8/5/2013 3:48 AM, Stefan Hajnoczi wrote: > > On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: > > > Am 02.08.2013 um 23:47 schrieb Mike Dawson <mike.dawson@xxxxxxxxxxxx>: > > > > We can "un-wedge" the guest by opening a NoVNC session or running a > > > > 'virsh screenshot' command. After that, the guest resumes and runs as > > > > expected. At that point we can examine the guest. Each time we'll see: > > > > If virsh screenshot works then this confirms that QEMU itself is still > > responding. Its main loop cannot be blocked since it was able to > > process the screendump command. > > > > This supports Josh's theory that a callback is not being invoked. The > > virtio-blk I/O request would be left in a pending state. > > > > Now here is where the behavior varies between configurations: > > > > On a Windows guest with 1 vCPU, you may see the symptom that the guest no > > longer responds to ping. > > > > On a Linux guest with multiple vCPUs, you may see the hung task message > > from the guest kernel because other vCPUs are still making progress. > > Just the vCPU that issued the I/O request and whose task is in > > UNINTERRUPTIBLE state would really be stuck. > > > > Basically, the symptoms depend not just on how QEMU is behaving but also > > on the guest kernel and how many vCPUs you have configured. > > > > I think this can explain how both problems you are observing, Oliver and > > Mike, are a result of the same bug. At least I hope they are :). > > > > Stefan > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com