We've been working on a storage repository for xenserver 6.5, which uses the 3.10 kernel (ug). I got the xenserver guys to include the rbd and libceph kernel modules into the 6.5 release, so that's at least available.
Where things go bad is when we have many (>10 or so) VMs on one host, all using RBD clones for the storage mapped using the rbd kernel module. The Xenserver crashes so badly that it doesn't even get a chance to kernel panic. The whole box just hangs.
Has anyone else seen this sort of behavior?
We have a lot of ways to try to work around this, but none of them are very pretty:
* move the code to user space, ditch the kernel driver: The build tools for Xenserver are all CentOS5 based, and it is painful to get all of the deps built to get the ceph user space libs built.
* backport the ceph and rbd kernel modules to 3.10. Has proven painful, as the block device code changed somewhere in the 3.14-3.16 timeframe.
* forward-port the xen kernel patches from 3.10 to a newer driver (3.18 preferred) and run that on xenserver. Painful for the same reasons as above, but in the opposite direction.
Any and all suggestions are welcome.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com