On Fri, Aug 31, 2012 at 11:02 PM, Ryan Nicholson <Ryan.Nicholson@xxxxxxxx> wrote: > Secondly: Through some trials, I've found that if one loses all of his Monitors in a way that they also lose their disks, one basically loses their cluster. I would like to recommend a lower priority shift in design that allows for "recovery of the entire monitor set from data/snapshots automatically stored at the osd's". > > For example, a monitor boots: > -keyring file and ceph.conf are available > -monitor sees that it is missing its local copy of maps, etc. > -goes onto the first OSD's it sees and pulls down a snapshot of the same > -checks for another running monitor, syncs with it, if not, > -boots at quorum 0, verifying OSD states > -life continues. Monitor fetching initial information from an OSD is full of challenges. The monitor won't know what IP addresses and ports the OSDs are, the OSDs won't trust the monitor to talk to them, etc (it lost its crypto keys, after all). It wouldn't even know which OSD to talk to, and I highly doubt having the backup on every OSD would be a good idea. > The big deal here, is that while the entire cluster is able to recover from failures using one storage philosophy, the monitors are using an entirely different, and more legacy storage philosophy - basically local RAID/power in numbers. Perhaps this has already been considered, and I would be interested in knowing what people think here, as well. Or perhaps I missed something and this is already done? That's why you run multiple monitors: they provide High Availability to the monitor service, as a whole. Losing all of your monitors at all disrupts operation of the cluster. Losing all of their stable storage really is disastrous. This is why you are supposed to deploy them in different failure domains, e.g. in different rows or rooms. If a monitor has its mon. keyring and ceph.conf, it should be able to join an existing monitor cluster as a new member, no special-case recovery needed. I'm not sure what kind of architecture you have that makes losing all the of the monitor disks somehow likely, but perhaps you should just take backups of their disks, with plain-old backup tools? Don't try to store that backup in the same Ceph cluster, though. It would be interesting to hear more about what you're thinking of, here. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html