Re: will crush rule be used during object relocation in OSD failure ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 23, 2018 at 11:01 AM ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx> wrote:

Hi all,


We've 8 osd hosts, 4 in room 1 and 4 in room2.   

A pool with size = 3 using following crush map is created, to cater for room failure.



rule multiroom {
        id 0
        type replicated
        min_size 2
        max_size 4
        step take default
        step choose firstn 2 type room
        step chooseleaf firstn 2 type host
        step emit
}



We're expecting:

1.for each object, there are always 2 replicas in one room and 1 replica in other room making size=3.  But we can't control which room has 1 or 2 replicas.

Right.
 

2.in case an osd host fails, ceph will assign remaining osds to the same PG to hold replicas on the failed osd host.  Selection is based on crush rule of the pool, thus maintaining the same failure domain - won't make all replicas in the same room.  

Yes, if a host fails the copies it held will be replaced by new copies in the same room.
 
  
3.in case of entire room with 1 replica fails, the pool will remain degraded but won't do any replica relocation. 

Right.
 

4. in case of entire room with 2 replicas fails, ceph will make use of osds in the surviving room and making 2 replicas.  Pool will not be writeable before all objects are made 2 copies (unless we make pool size=4?).  Then when recovery is complete, pool will remain in degraded state until the failed room recover. 

Hmm, I'm actually not sure if this will work out — because CRUSH is hierarchical, it will keep trying to select hosts from the dead room and will fill out the location vector's first two spots with -1. It could be that Ceph will skip all those "nonexistent" entries and just pick the two copies from slots 3 and 4, but it might not. You should test this carefully and report back!
-Greg


Is our understanding correct?  Thanks a lot.
Will do some simulation later to verify.

Regards,
/stwong

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux