On Wed, Apr 8, 2015 at 5:25 PM, Shawn Edwards <lesser.evil@xxxxxxxxx> wrote: > We've been working on a storage repository for xenserver 6.5, which uses the > 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph > kernel modules into the 6.5 release, so that's at least available. > > Where things go bad is when we have many (>10 or so) VMs on one host, all > using RBD clones for the storage mapped using the rbd kernel module. The > Xenserver crashes so badly that it doesn't even get a chance to kernel > panic. The whole box just hangs. I'm not very familiar with Xen and ways to debug it but if the problem lies in libceph or rbd kernel modules we'd like to fix it. Perhaps try grabbing a vmcore? If it just hangs and doesn't panic you can normally induce a crash with a sysrq. > > Has anyone else seen this sort of behavior? > > We have a lot of ways to try to work around this, but none of them are very > pretty: > > * move the code to user space, ditch the kernel driver: The build tools for > Xenserver are all CentOS5 based, and it is painful to get all of the deps > built to get the ceph user space libs built. > > * backport the ceph and rbd kernel modules to 3.10. Has proven painful, as > the block device code changed somewhere in the 3.14-3.16 timeframe. https://github.com/ceph/ceph-client/commits/rhel7-3.10.0-123.9.3 branch would be a good start - it has libceph.ko and rbd.ko as of 3.18-rc5 backported to rhel7 (which is based on 3.10) and may be updated in the future as well, although no promises on that. Thanks, Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com