On 07/30/2012 07:46 PM, Eric_YH_Chen@xxxxxxxxxx wrote:
Hi, all: I am testing high-availability of ceph. Environment: two servers, and 12 hard-disk on each server. Version: Ceph 0.48 Kernel: 3.2.0-27 We create a ceph cluster with 24 osd. Osd.0 ~ osd.11 is on server1 Osd.12 ~ osd.23 is on server2 The crush rule is using default rule. rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1536 pgp_num 1536 last_change 1172 owner 0 Test case 1: 1. Create a rbd device and read/write to it 2. Random turn off one osd on server1 (service ceph stop osd.0) 3. check the read/write of rbd device Test case 2: 1. Create a rbd device and read/write to it 2. Random turn off one osd on server1 (service ceph stop osd.0) 2. Random turn off one osd on server2 (service ceph stop osd.12) 3. check the read/write of rbd device About test case 1, we can access the rbd device as normal. But about test case 2, we would hang there and no response. Is it a correct scenario ? I imagine that we can turn off any two osd when we set the replication as 2. Because without the master data, we have two other copies on two different osd. Even when we turn off two osd, we can find the data on third osd. Any misunderstanding? Thanks!
rep size is the total number of copies, so stopping two osds with rep size 2 may cause you to lose access to some objects. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html