RE: [Discussion] Enhancement for CRUSH rules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,
     I am thinking about the possibility to add some primitive in CRUSH to meet the following user stories:
A. "Same host", "Same rack"
	To balance between availability and performance ,one may like such a rule: 3 Replicas, Replica 1 and Replica 2 should in the same rack while Replica 3 reside in another rack.This is common because a typical d	eployment in datacenter usually has much fewer uplink bandwidth than backbone bandwidth.

More aggressive guys may even want same host, since the most common failure is disk failure. And we have several disk (also means several OSDs) reside in the same physical machine. If we can place Replica 1 & 2 on the same host but replica 3 in somewhere else.It will not only reduce replication traffic but also saving a lot of time & bandwidth when disk failure happened and a recovery take place.
B."local"
	 Although we cannot mount RBD volumes to where a OSD running at, but QEMU canbe used. This scenarios is really common in cloud computing. We have a large amount of compute-nodes, just plug in some disks 	and make the machines reused for Ceph cluster. To reduce network traffic and latency , if it is possible to have some placement-group-maybe 3 PG for a compute-node. Define the rules like: primary copy of the PG 	should (if possible) reside in localhost, the second replica should go different places
	
	By doing this , a significant amount of network bandwidth & a RTT can be saved. What's more ,since read always go to primary, it will benefit a lot from such mechanism.

It looks to me that A is simpler but B seems much complex. Hoping for inputs.

                                                                                                                                                                                                                       Xiaoxi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux