Re: Replacing DRBD use with RBD

Martin Fick <mogulguy@xxxxxxxxx> · Wed, 5 May 2010 13:34:08 -0700 (PDT)

--- On Wed, 5/5/10, Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx> wrote:
> The problem is that the ceph monitors require a quorum in
> order to decide on the cluster state. The way the system 
> works right now, a 2-way monitor setup would be less stable 
> than a system with a single monitor since it wouldn't work
> whenever any of the two monitors crashes. 

Right, that is indeed not nice. :)

> A possible workaround would be to have a special case for a
> 2-way mon clusters, where it'd require a single mon for
> getting a majority. I'm not sure whether this is actually 
> feasible. As usual, the devil is in the details.

Yes. One simple way is to use a ping node.  If a node can
reach the ping node, but not its peer, it should be able
to assume "lone operation" and thus effectively degrade to
a single monitor situation temporarily. I guess my question
is, "is this something that the ceph project is 
potentially willing to support for OSDs?"

I suspect that also supporting dynamic reconfiguration:
http://en.wikipedia.org/wiki/Paxos_algorithm#Cheap_Paxos
would also help a great deal to make clusters more
adaptable.

> > One suggestion I have would be to do this would be to
> > use some of the same techniques that heartbeat uses to
> > determine whether a node has gone down or if instead there
> > is network segregation: a serial port connection, common
> > ping nodes (such as a router)...

> There is a heartbeat mechanism withing the mon cluster, and
> it's being used for the monitors to keep track of their peer
> status. It might be a good idea to add different configurable 
> types of heartbeats.

Yes, specifically, I meant by using some of the techniques
that the heartbeat project uses:

http://www.linux-ha.org/wiki/Heartbeat

Ideally (my suggestion,) they would make some of them 
available in a library so that other projects like 
RADOS could use them independently without having to 
rewrite them from scratch.

> > 2) Is there any way of preventing two users of an RBD
> > device from using the device concurrently?  ...
> 
> We were just thinking about the proper solution to this
> problem ourselves. There are a few options. One is to 
> add some kinds of locking mechanism to the osd, which
> would allow doing just that. E.g., a client would take 
> a lock, do whatever it needs to do, a second client 
> would try to get the lock but will be able to hold it only
> after the first one has released it. Another option would
> be to have the clients handle the mutual exclusion 
> themselves (hence not enforced by the osd) by setting 
> flags and leases on the rbd header.

I'm curious, do you mean a scheme such as writing the
name of the node "locking" the image along with a 
timestamp regularly to the header as a heartbeat?  
Along with some lock acquisition logic?

Thanks for the replies!

-Martin

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html