High-availability testing of ceph

<Eric_YH_Chen@xxxxxxxxxx> · Tue, 31 Jul 2012 02:46:26 +0000

Hi, all:

I am testing high-availability of ceph.

Environment:  two servers, and 12 hard-disk on each server. Version: Ceph 0.48
             Kernel: 3.2.0-27

We create a ceph cluster with 24 osd.  
Osd.0 ~ osd.11 is on server1   
Osd.12 ~ osd.23 is on server2

The crush rule is using default rule.
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1536 pgp_num 1536 last_change 1172 owner 0

Test case 1: 
1. Create a rbd device and read/write to it
2. Random turn off one osd on server1  (service ceph stop osd.0)
3. check the read/write of rbd device

Test case 2: 
1. Create a rbd device and read/write to it
2. Random turn off one osd on server1  (service ceph stop osd.0)
2. Random turn off one osd on server2  (service ceph stop osd.12)
3. check the read/write of rbd device

About test case 1, we can access the rbd device as normal. But about test case 2, we would hang there and no response.
Is it a correct scenario ? 

I imagine that we can turn off any two osd when we set the replication as 2. 
Because without the master data, we have two other copies on two different osd. 
Even when we turn off two osd, we can find the data on third osd.
Any misunderstanding? Thanks!

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html