Re: will crush rule be used during object relocation in OSD failure ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Will there be much difference in performance between EC and replicated? Thanks. Hope can do more testing on EC before deadline of our first production CEPH...

In general, yes, there will be a difference in performance. Of course it depends on the actual configuration, but if you rely on performance I would stick with replication. Running your own tests with EC on your existing setup will reveal performance differences and help you decide which way to go.

Regards,
Eugen


Zitat von "ST Wong (ITSC)" <ST@xxxxxxxxxxxxxxxx>:

Hi,

Thanks. As power supply to one of our server rooms is not so stable, will probably use size=4,min_size=2 to prevent data lose.

If the overhead is too high could EC be an option for your setup?

Will there be much difference in performance between EC and replicated? Thanks. Hope can do more testing on EC before deadline of our first production CEPH...

Regards,
/st

-----Original Message-----
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> On Behalf Of Eugen Block
Sent: Tuesday, February 12, 2019 5:32 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re: will crush rule be used during object relocation in OSD failure ?

Hi,

I came to the same conclusion after doing various tests with rooms and failure domains. I agree with Maged and suggest to use size=4, min_size=2 for replicated pools. It's more overhead but you can survive the loss of one room and even one more OSD (of the affected PG) without losing data. You'll also have the certainty that there are always two replicas per room, no guessing or hoping which room is more likely to fail.

If the overhead is too high could EC be an option for your setup?

Regards,
Eugen


Zitat von "ST Wong (ITSC)" <ST@xxxxxxxxxxxxxxxx>:

Hi all,

Tested 4 cases.  Case 1-3 are as expected, while for case 4,
rebuild didn’t take place on surviving room as Gregory mentioned.
Repeated case 4 several times on both rooms got same result.  We’re
running mimic 13.2.2.

E.g.

Room1
Host 1 osd: 2,5
Host 2 osd: 1,3

Room 2  <-- failed room
Host 3 osd: 0,4
Host 4 osd: 6,7


Before:
5.62          0                  0        0         0       0
 0    0        0 active+clean 2019-02-12 04:47:28.183375
0'0      3643:2299   [0,7,5]          0   [0,7,5]              0
       0'0 2019-02-12 04:47:28.183218             0'0 2019-02-11
01:20:51.276922             0

After:
5.62          0                  0        0         0       0
 0    0        0          undersized+peered 2019-02-12
09:10:59.101096            0'0      3647:2284   [5]          5
[5]              5            0'0 2019-02-12 04:47:28.183218
    0'0 2019-02-11 01:20:51.276922             0

Fyi.   Sorry for the belated report.

Thanks a lot.
/st


From: Gregory Farnum <gfarnum@xxxxxxxxxx>
Sent: Monday, November 26, 2018 9:27 PM
To: ST Wong (ITSC) <ST@xxxxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  will crush rule be used during object
relocation in OSD failure ?

On Fri, Nov 23, 2018 at 11:01 AM ST Wong (ITSC)
<ST@xxxxxxxxxxxxxxxx<mailto:ST@xxxxxxxxxxxxxxxx>> wrote:

Hi all,



We've 8 osd hosts, 4 in room 1 and 4 in room2.

A pool with size = 3 using following crush map is created, to cater
for room failure.


rule multiroom {
        id 0
        type replicated
        min_size 2
        max_size 4
        step take default
        step choose firstn 2 type room
        step chooseleaf firstn 2 type host
        step emit
}




We're expecting:

1.for each object, there are always 2 replicas in one room and 1
replica in other room making size=3.  But we can't control which room
has 1 or 2 replicas.

Right.


2.in<http://2.in> case an osd host fails, ceph will assign remaining
osds to the same PG to hold replicas on the failed osd host.
Selection is based on crush rule of the pool, thus maintaining the
same failure domain - won't make all replicas in the same room.

Yes, if a host fails the copies it held will be replaced by new copies
in the same room.


3.in<http://3.in> case of entire room with 1 replica fails, the pool
will remain degraded but won't do any replica relocation.

Right.


4. in case of entire room with 2 replicas fails, ceph will make use of
osds in the surviving room and making 2 replicas.  Pool will not be
writeable before all objects are made 2 copies (unless we make pool
size=4?).  Then when recovery is complete, pool will remain in
degraded state until the failed room recover.

Hmm, I'm actually not sure if this will work out — because CRUSH is
hierarchical, it will keep trying to select hosts from the dead room
and will fill out the location vector's first two spots with -1. It
could be that Ceph will skip all those "nonexistent" entries and just
pick the two copies from slots 3 and 4, but it might not. You should
test this carefully and report back!
-Greg

Is our understanding correct?  Thanks a lot.
Will do some simulation later to verify.

Regards,
/stwong
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux