Re: Replacing DRBD use with RBD

Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx> · Wed, 5 May 2010 13:00:38 -0700

On Tue, May 4, 2010 at 4:46 PM, Martin Fick <mogulguy@xxxxxxxxx> wrote:
> Hello,

Hi!

>
> I have a questions with respect to RADOS and RBD and the cluster monitor daemons.
>
> 1) Is there any chance that the cluster monitor protocol will be enhanced to work practically with only 2 monitor daemons?  I ask since this seems like it would allow a 2 node RBD based device to effectively replace a DRBD based device and yet be much more easily expandable to more nodes than DRBD.  Many HA systems (say telco racks) only have two nodes and it seems silly to miss out on the opportunity to be able to use RBD in those systems.

The problem is that the ceph monitors require a quorum in order to
decide on the cluster state. The way the system works right now, a
2-way monitor setup would be less stable than a system with a single
monitor since it wouldn't work whenever any of the two monitors
crashes. A possible workaround would be to have a special case for a
2-way mon clusters, where it'd require a single mon for getting a
majority. I'm not sure whether this is actually feasible. As usual,
the devil is in the details.

>
> One suggestion I have would be to do this would be to use some of the same techniques that heartbeat uses to determine whether a node has gone down or if instead there is network segregation: a serial port connection, common ping nodes (such as a router)...
There is a heartbeat mechanism withing the mon cluster, and it's being
used for the monitors to keep track of their peer status. It might be
a good idea to add different configurable types of heartbeats.

>
> I suspect that if reliable 2 node operation were designed into RBD, it would eventually replace some of the uses of DRBD.
>
>
> 2) Is there any way of preventing two users of an RBD device from using the device concurrently?  Is there someway to create "locks" with RADOS that would die if a node dies?  If so, this would allow an RBD device to be safely mounted as a non distributed FS such as ext3 exclusively on one of many hosts.  This would open up the use of RBD devices for linux containers or linux vservers which could run on any machine in a cluster (similar to the idea of using it with kvm/qemu).

We were just thinking about the proper solution to this problem
ourselves. There are a few options. One is to add some kinds of
locking mechanism to the osd, which would allow doing just that. E.g.,
a client would take a lock, do whatever it needs to do, a second
client would try to get the lock but will be able to hold it only
after the first one has released it. Another option would be to have
the clients handle the mutual exclusion themselves (hence not enforced
by the osd) by setting flags and leases on the rbd header. There are
other options, but the latter would be much easier to implement and
we'll start from there.

>
> Thanks, I look forward to playing with RBD and ceph!
>

Thank you!

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html