On Tue, May 4, 2010 at 4:46 PM, Martin Fick <mogulguy@xxxxxxxxx> wrote: > Hello, Hi! > > I have a questions with respect to RADOS and RBD and the cluster monitor daemons. > > 1) Is there any chance that the cluster monitor protocol will be enhanced to work practically with only 2 monitor daemons? I ask since this seems like it would allow a 2 node RBD based device to effectively replace a DRBD based device and yet be much more easily expandable to more nodes than DRBD. Many HA systems (say telco racks) only have two nodes and it seems silly to miss out on the opportunity to be able to use RBD in those systems. The problem is that the ceph monitors require a quorum in order to decide on the cluster state. The way the system works right now, a 2-way monitor setup would be less stable than a system with a single monitor since it wouldn't work whenever any of the two monitors crashes. A possible workaround would be to have a special case for a 2-way mon clusters, where it'd require a single mon for getting a majority. I'm not sure whether this is actually feasible. As usual, the devil is in the details. > > One suggestion I have would be to do this would be to use some of the same techniques that heartbeat uses to determine whether a node has gone down or if instead there is network segregation: a serial port connection, common ping nodes (such as a router)... There is a heartbeat mechanism withing the mon cluster, and it's being used for the monitors to keep track of their peer status. It might be a good idea to add different configurable types of heartbeats. > > I suspect that if reliable 2 node operation were designed into RBD, it would eventually replace some of the uses of DRBD. > > > 2) Is there any way of preventing two users of an RBD device from using the device concurrently? Is there someway to create "locks" with RADOS that would die if a node dies? If so, this would allow an RBD device to be safely mounted as a non distributed FS such as ext3 exclusively on one of many hosts. This would open up the use of RBD devices for linux containers or linux vservers which could run on any machine in a cluster (similar to the idea of using it with kvm/qemu). We were just thinking about the proper solution to this problem ourselves. There are a few options. One is to add some kinds of locking mechanism to the osd, which would allow doing just that. E.g., a client would take a lock, do whatever it needs to do, a second client would try to get the lock but will be able to hold it only after the first one has released it. Another option would be to have the clients handle the mutual exclusion themselves (hence not enforced by the osd) by setting flags and leases on the rbd header. There are other options, but the latter would be much easier to implement and we'll start from there. > > Thanks, I look forward to playing with RBD and ceph! > Thank you! Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html