Re: Ceph + Xen - RBD io hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi James,

That doesn't sound like a fun one to debug.  I'll try your messaging
stack size tweak after the current (super ugly) hack experiment, to be
described next....

Thanks-

	John


On 10/28/2013 11:11 PM, James Harper wrote:
> Maybe nothing to do with your issue, but I was having problems using librbd with blktap, and ended up adding:
> 
> [client]
>   ms rwthread stack bytes = 8388608
> 
> to my config. This is a workaround, not a fix though (IMHO) as there is nothing to indicate that librbd is running out of stack space, rather that stack is being clobbered and this works around it. I spent a fair bit of time trying to debug it but could never pin it down.
> 
> James
> 
>> -----Original Message-----
>> From: ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-
>> bounces@xxxxxxxxxxxxxx] On Behalf Of John Morris
>> Sent: Tuesday, 29 October 2013 6:01 AM
>> To: ceph-users@xxxxxxxxxxxxxx
>> Subject:  Ceph + Xen - RBD io hang
>>
>> I'm encountering a problem with RBD-backed Xen.  During a VM boot,
>> pygrub attaches the VM's root VDI to dom0.  This hangs with these
>> messages in the debug log:
>>
>> Oct 27 21:19:59 xen27 kernel:
>>   vbd vbd-51728: 16 Device in use; refusing to close
>> Oct 27 21:19:59 xen27 xenopsd-xenlight:
>>   [xenops] waiting for backend to close
>> Oct 27 21:19:59 xen27 kernel:
>>   qemu-system-i38[2899]: segfault at 7fac042e4000 ip 00007fac0447b129
>>   sp 00007fffe7028630 error 4 in qemu-system-i386[7fac042ed000+309000]
>>
>> More details here:
>>
>> http://pastebin.ca/2472234
>>
>>   - Scientific Linux 6
>>   - 64-bit, Phenom CPU
>>   - Ceph from RPM ceph-0.67.4-0.el6.x86_64
>>   - XenAPI from Dave Scott's technology preview
>>   - two btrfs-backed OSDs with journals on separate drives
>>   - various kernels, incl. 3.4.6 from Dave Scott's repo and 3.11.6
>>     from elrepo.org.
>>
>> This thread (whose Subject: I borrowed) describes what I'm seeing quite
>> well, but no resolution was posted:
>>
>> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/3636
>>
>> In my case, udevd starts a 'blkid' process that holds /dev/xvdb open.
>> Like in James's case, any interaction with the device will hang, and
>> usually can't be killed.  This same problem prevents the machine from
>> completing shutdown.
>>
>> In that thread, Sylvain Munaut says the OSD and kernel driver shouldn't
>> be run in the same host.  I believe my setup does not violate that,
>> since the rbd kernel module is not loaded, and instead the device is
>> attached through the xen_blkfront module instead.
>>
>> Thanks-
>>
>> 	John
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux