Hi James, That doesn't sound like a fun one to debug. I'll try your messaging stack size tweak after the current (super ugly) hack experiment, to be described next.... Thanks- John On 10/28/2013 11:11 PM, James Harper wrote: > Maybe nothing to do with your issue, but I was having problems using librbd with blktap, and ended up adding: > > [client] > ms rwthread stack bytes = 8388608 > > to my config. This is a workaround, not a fix though (IMHO) as there is nothing to indicate that librbd is running out of stack space, rather that stack is being clobbered and this works around it. I spent a fair bit of time trying to debug it but could never pin it down. > > James > >> -----Original Message----- >> From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users- >> bounces@xxxxxxxxxxxxxx] On Behalf Of John Morris >> Sent: Tuesday, 29 October 2013 6:01 AM >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: Ceph + Xen - RBD io hang >> >> I'm encountering a problem with RBD-backed Xen. During a VM boot, >> pygrub attaches the VM's root VDI to dom0. This hangs with these >> messages in the debug log: >> >> Oct 27 21:19:59 xen27 kernel: >> vbd vbd-51728: 16 Device in use; refusing to close >> Oct 27 21:19:59 xen27 xenopsd-xenlight: >> [xenops] waiting for backend to close >> Oct 27 21:19:59 xen27 kernel: >> qemu-system-i38[2899]: segfault at 7fac042e4000 ip 00007fac0447b129 >> sp 00007fffe7028630 error 4 in qemu-system-i386[7fac042ed000+309000] >> >> More details here: >> >> http://pastebin.ca/2472234 >> >> - Scientific Linux 6 >> - 64-bit, Phenom CPU >> - Ceph from RPM ceph-0.67.4-0.el6.x86_64 >> - XenAPI from Dave Scott's technology preview >> - two btrfs-backed OSDs with journals on separate drives >> - various kernels, incl. 3.4.6 from Dave Scott's repo and 3.11.6 >> from elrepo.org. >> >> This thread (whose Subject: I borrowed) describes what I'm seeing quite >> well, but no resolution was posted: >> >> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/3636 >> >> In my case, udevd starts a 'blkid' process that holds /dev/xvdb open. >> Like in James's case, any interaction with the device will hang, and >> usually can't be killed. This same problem prevents the machine from >> completing shutdown. >> >> In that thread, Sylvain Munaut says the OSD and kernel driver shouldn't >> be run in the same host. I believe my setup does not violate that, >> since the rbd kernel module is not loaded, and instead the device is >> attached through the xen_blkfront module instead. >> >> Thanks- >> >> John >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com