Thanks for the pointer to the patched kernel. I'll give that a shot.
On Thu, Apr 9, 2015, 5:56 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Wed, Apr 8, 2015 at 5:25 PM, Shawn Edwards <lesser.evil@xxxxxxxxx> wrote:
> We've been working on a storage repository for xenserver 6.5, which uses the
> 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph
> kernel modules into the 6.5 release, so that's at least available.
>
> Where things go bad is when we have many (>10 or so) VMs on one host, all
> using RBD clones for the storage mapped using the rbd kernel module. The
> Xenserver crashes so badly that it doesn't even get a chance to kernel
> panic. The whole box just hangs.
I'm not very familiar with Xen and ways to debug it but if the problem
lies in libceph or rbd kernel modules we'd like to fix it. Perhaps try
grabbing a vmcore? If it just hangs and doesn't panic you can normally
induce a crash with a sysrq.
>
> Has anyone else seen this sort of behavior?
>
> We have a lot of ways to try to work around this, but none of them are very
> pretty:
>
> * move the code to user space, ditch the kernel driver: The build tools for
> Xenserver are all CentOS5 based, and it is painful to get all of the deps
> built to get the ceph user space libs built.
>
> * backport the ceph and rbd kernel modules to 3.10. Has proven painful, as
> the block device code changed somewhere in the 3.14-3.16 timeframe.
https://github.com/ceph/ceph-client/commits/rhel7-3.10.0-123.9.3 branch
would be a good start - it has libceph.ko and rbd.ko as of 3.18-rc5
backported to rhel7 (which is based on 3.10) and may be updated in the
future as well, although no promises on that.
Thanks,
Ilya
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com