Re: Integration work

Tommi Virtanen <tv@xxxxxxxxxxx> · Tue, 4 Sep 2012 08:52:43 -0700

On Fri, Aug 31, 2012 at 11:02 PM, Ryan Nicholson
<Ryan.Nicholson@xxxxxxxx> wrote:
> Secondly: Through some trials, I've found that if one loses all of his Monitors in a way that they also lose their disks, one basically loses their cluster. I would like to recommend a lower priority shift in design that allows for "recovery of the entire monitor set from data/snapshots automatically stored at the osd's".
>
> For example, a monitor boots:
>         -keyring file and ceph.conf are available
>         -monitor sees that it is missing its local copy of maps, etc.
>         -goes onto the first OSD's it sees and pulls down a snapshot of the same
>         -checks for another running monitor, syncs with it, if not,
>         -boots at quorum 0, verifying OSD states
>         -life continues.

Monitor fetching initial information from an OSD is full of
challenges. The monitor won't know what IP addresses and ports the
OSDs are, the OSDs won't trust the monitor to talk to them, etc (it
lost its crypto keys, after all). It wouldn't even know which OSD to
talk to, and I highly doubt having the backup on every OSD would be a
good idea.

> The big deal here, is that while the entire cluster is able to recover from failures using one storage philosophy, the monitors are using an entirely different, and more legacy storage philosophy - basically local RAID/power in numbers. Perhaps this has already been considered, and I would be interested in knowing what people think here, as well. Or perhaps I missed something and this is already done?

That's why you run multiple monitors: they provide High Availability
to the monitor service, as a whole. Losing all of your monitors at all
disrupts operation of the cluster. Losing all of their stable storage
really is disastrous. This is why you are supposed to deploy them in
different failure domains, e.g. in different rows or rooms.

If a monitor has its mon. keyring and ceph.conf, it should be able to
join an existing monitor cluster as a new member, no special-case
recovery needed.

I'm not sure what kind of architecture you have that makes losing all
the of the monitor disks somehow likely, but perhaps you should just
take backups of their disks, with plain-old backup tools? Don't try to
store that backup in the same Ceph cluster, though. It would be
interesting to hear more about what you're thinking of, here.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html