Re: Ceph + Xen - RBD io hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The suspicious line in /var/log/debug (see the pastebin below) and that
'blkid' was the culprit keeping the device open looked like juicy clues:

  kernel: vbd vbd-51728: 16 Device in use; refusing to close

Search results:

  https://www.redhat.com/archives/libguestfs/2012-February/msg00023.html

  https://rwmj.wordpress.com/2012/01/19/udev-unexpectedness/#content

blkid is started up by udevd after block device changes.  The theory is
while blkid runs it holds the device open, and after pygrub, xenops
fails to close the device if blkid hasn't finished by then.

The above links suggest to avoid the condition by running 'udevadm
settle' before closing the device.

For a cheap hack, I added 'subprocess.call(("/sbin/udevadm", "settle"))'
at the end of the pygrub script.

Several VMs *in a row* successfully started with no failures, for the
first time since I pulled the host's root filesystem off the OSDs.

In terms of the udev theory, perhaps the extra load on the disk shared
by both OS and OSD was slowing things down enough that the race
condition was rarely triggered.  After putting the OS on a separate
disk, the system was noticeably snappier, triggering this race condition
to where it was difficult to boot two VMs in a row (annoying when a
single failure required the machine to be power-cycled to recover!).

	John



On 10/28/2013 02:01 PM, John Morris wrote:
> I'm encountering a problem with RBD-backed Xen.  During a VM boot,
> pygrub attaches the VM's root VDI to dom0.  This hangs with these
> messages in the debug log:
> 
> Oct 27 21:19:59 xen27 kernel:
>   vbd vbd-51728: 16 Device in use; refusing to close
> Oct 27 21:19:59 xen27 xenopsd-xenlight:
>   [xenops] waiting for backend to close
> Oct 27 21:19:59 xen27 kernel:
>   qemu-system-i38[2899]: segfault at 7fac042e4000 ip 00007fac0447b129
>   sp 00007fffe7028630 error 4 in qemu-system-i386[7fac042ed000+309000]
> 
> More details here:
> 
> http://pastebin.ca/2472234
> 
>   - Scientific Linux 6
>   - 64-bit, Phenom CPU
>   - Ceph from RPM ceph-0.67.4-0.el6.x86_64
>   - XenAPI from Dave Scott's technology preview
>   - two btrfs-backed OSDs with journals on separate drives
>   - various kernels, incl. 3.4.6 from Dave Scott's repo and 3.11.6
>     from elrepo.org.
> 
> This thread (whose Subject: I borrowed) describes what I'm seeing quite
> well, but no resolution was posted:
> 
> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/3636
> 
> In my case, udevd starts a 'blkid' process that holds /dev/xvdb open.
> Like in James's case, any interaction with the device will hang, and
> usually can't be killed.  This same problem prevents the machine from
> completing shutdown.
> 
> In that thread, Sylvain Munaut says the OSD and kernel driver shouldn't
> be run in the same host.  I believe my setup does not violate that,
> since the rbd kernel module is not loaded, and instead the device is
> attached through the xen_blkfront module instead.
> 
> Thanks-
> 
> 	John
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux