Re: High-availability testing of ceph

Josh Durgin <josh.durgin@xxxxxxxxxxx> · Mon, 30 Jul 2012 22:55:41 -0700

On 07/30/2012 07:46 PM, Eric_YH_Chen@xxxxxxxxxx wrote:
Hi, all:

I am testing high-availability of ceph.

Environment:  two servers, and 12 hard-disk on each server. Version: Ceph 0.48
              Kernel: 3.2.0-27

We create a ceph cluster with 24 osd.
Osd.0 ~ osd.11 is on server1
Osd.12 ~ osd.23 is on server2

The crush rule is using default rule.
rule rbd {
         ruleset 2
         type replicated
         min_size 1
         max_size 10
         step take default
         step chooseleaf firstn 0 type host
         step emit
}

pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1536 pgp_num 1536 last_change 1172 owner 0

Test case 1:
1. Create a rbd device and read/write to it
2. Random turn off one osd on server1  (service ceph stop osd.0)
3. check the read/write of rbd device

Test case 2:
1. Create a rbd device and read/write to it
2. Random turn off one osd on server1  (service ceph stop osd.0)
2. Random turn off one osd on server2  (service ceph stop osd.12)
3. check the read/write of rbd device

About test case 1, we can access the rbd device as normal. But about test case 2, we would hang there and no response.
Is it a correct scenario ?

I imagine that we can turn off any two osd when we set the replication as 2.
Because without the master data, we have two other copies on two different osd.
Even when we turn off two osd, we can find the data on third osd.
Any misunderstanding? Thanks!

rep size is the total number of copies, so stopping two osds with rep
size 2 may cause you to lose access to some objects.

Josh

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html