Hi Szabolcs, 2012/1/6 Székelyi Szabolcs <szekelyi@xxxxxxx>: > On 2011. December 29. 20:58:00 Florian Haas wrote: >> please consider reviewing the following patches. These add >> OCF-compliant cluster resource agent functionality to Ceph, allowing >> MDS, OSD and MON to run as cluster resources under compliant managers >> (such as Pacemaker, http://www.clusterlabs.org). > > Nice work, however, I don't really see the point of running Ceph in a HA > cluster. If you have more than one machine, then why not deploy Ceph as an > active-active (Ceph) cluster? If you want an active-backup cluster, then why > use Ceph? What makes you think that Pacemaker doesn't support active/active? > There might be situations however, where this feature can come handy, although > I can't think of any right now. Can you sketch up one? As far as I'm informed, there's currently no "official" method of recovering Ceph daemons in place when they die. And I suppose (correct me if I'm wrong) that there would be two ways of achieving that. 1. systemd integration. systemd, via Restart=on-failure (or maybe even Restart=always) in a .service definition, could recover a failed daemon. As far as I can see, such service definitions don't exist for any of the ceph daemons, and systemd is not exactly my home turf, so I can't really contribute to ceph/systemd integration. That being said, systemd is currently far from ubiquitous on most platforms, and specifically in the Debian/Ubuntu corner I wonder if we're going to see widespread systemd integration anytime soon. (It still would be cool to have, needless to say.) 2. Pacemaker integration. Pacemaker has the ability of recovering daemons in-place via its monitor operations and automatic resource recovery, and has no systemd dependency (it doesn't need to interface with any of the init daemons for resource management, really). Pacemaker is available across all Linux distros, today. Pacemaker integration has an added benefit: since Pacemaker is aware of the services in a cluster, we can always tell the cluster, i want this many mon instances, or this many OSDs, and Pacemaker can ensure exactly that. Pacemaker is also unique among cluster managers that it supports clones, a configuration facility that comes in very handy for Ceph daemon management. Pacemaker (or more specifically its underlying communications/messaging layer) is, obviously, not without limitations. Specifically, most people deploy clusters of less than 10 nodes in size, with 32 nodes in one cluster membership being the current maximum that any QA/QE organization frequently tests for reliability. But still, say a 20-node Pacemaker cluster, where 8 nodes hold RADOS storage and the other 12 are hypervisor hosts consuming the storage via libvirt/RBD -- such a thing can still manage an ample number of Terabytes worth of storage, and host a pretty large array of virtual machines, don't you think? Cheers, Florian -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html