Hi Oliver, (Posted this on the bug too, but:) Your last log revealed a bug in the librados aio flush. A fix is pushed to wip-librados-aio-flush (bobtail) and wip-5919 (master); can you retest please (with caching off again)? Thanks! sage On Fri, 9 Aug 2013, Oliver Francke wrote: > Hi Josh, > > just opened > > http://tracker.ceph.com/issues/5919 > > with all collected information incl. debug-log. > > Hope it helps, > > Oliver. > > On 08/08/2013 07:01 PM, Josh Durgin wrote: > > On 08/08/2013 05:40 AM, Oliver Francke wrote: > > > Hi Josh, > > > > > > I have a session logged with: > > > > > > debug_ms=1:debug_rbd=20:debug_objectcacher=30 > > > > > > as you requested from Mike, even if I think, we do have another story > > > here, anyway. > > > > > > Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is > > > 3.2.0-51-amd... > > > > > > Do you want me to open a ticket for that stuff? I have about 5MB > > > compressed logfile waiting for you ;) > > > > Yes, that'd be great. If you could include the time when you saw the guest > > hang that'd be ideal. I'm not sure if this is one or two bugs, > > but it seems likely it's a bug in rbd and not qemu. > > > > Thanks! > > Josh > > > > > Thnx in advance, > > > > > > Oliver. > > > > > > On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote: > > > > On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: > > > > > Am 02.08.2013 um 23:47 schrieb Mike Dawson <mike.dawson@xxxxxxxxxxxx>: > > > > > > We can "un-wedge" the guest by opening a NoVNC session or running a > > > > > > 'virsh screenshot' command. After that, the guest resumes and runs > > > > > > as expected. At that point we can examine the guest. Each time we'll > > > > > > see: > > > > If virsh screenshot works then this confirms that QEMU itself is still > > > > responding. Its main loop cannot be blocked since it was able to > > > > process the screendump command. > > > > > > > > This supports Josh's theory that a callback is not being invoked. The > > > > virtio-blk I/O request would be left in a pending state. > > > > > > > > Now here is where the behavior varies between configurations: > > > > > > > > On a Windows guest with 1 vCPU, you may see the symptom that the guest > > > > no > > > > longer responds to ping. > > > > > > > > On a Linux guest with multiple vCPUs, you may see the hung task message > > > > from the guest kernel because other vCPUs are still making progress. > > > > Just the vCPU that issued the I/O request and whose task is in > > > > UNINTERRUPTIBLE state would really be stuck. > > > > > > > > Basically, the symptoms depend not just on how QEMU is behaving but also > > > > on the guest kernel and how many vCPUs you have configured. > > > > > > > > I think this can explain how both problems you are observing, Oliver and > > > > Mike, are a result of the same bug. At least I hope they are :). > > > > > > > > Stefan > > > > > > > > > > > -- > > Oliver Francke > > filoo GmbH > Moltkestra?e 25a > 33330 G?tersloh > HRB4355 AG G?tersloh > > Gesch?ftsf?hrer: J.Rehp?hler | C.Kunz > > Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com