Re: Erasure Pool OSD fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



well, you should use M > 1, the more you have, less risk and more performance.

You don't read twice as much data, you read it from different sources, further more you can even read less data and have to rebuild it, because on erasure pools you don't replicate the data.


On the other hand, the configuration it's not as bad as you think, its just different.

3 nodes cluster

Replicate pool size = 2

    -you can take 1 failure, then re-balance and take another failure. (max 2 separate)

    -you use 2*data space

    -you have to write 2*data, full data on one node and full data on the second one.

Erasure code pool

    -you can only lose 1 node

    -you use less space

    -as you dont write 2*data, writes are also faster. You write half data on one node, half data on the other and parity on separate nodes, write work is a lot more distributed.

    -reads are slower because you need all the data parts.


On both configurations, if you have corrupted data you lose your data, so that's not really a point to compare.

Replicate pool can achieve way more insensitive read works while Erasure pools are thought to perform big writes but really few reads.


I have check myself that both configurations can work with a 3 node cluster so it's not a better and a worse configuration, it really depend on your work, and the best thing :) you can have both in the same OSDs!


El 24/10/2017 a las 12:37, Eino Tuominen escribió:

Hello,


Correct me if I'm wrong, but isn't your configuration just twice as bad as running with replication size=2? With replication size=2 when you lose a disk you lose data if there is even one defect block found when ceph is reconstructing the pgs that had a replica on the failed disk. No, with your setup you have to be able to read twice as much data correctly in order to reconstruct the pgs. When using EC I think that you have to use m>1 in production.


-- 

  Eino Tuominen



From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Jorge Pinilla López <jorpilo@xxxxxxxxx>
Sent: Tuesday, October 24, 2017 11:24
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Erasure Pool OSD fail
 

Okay I think I can respond myself, the pool is created with a default min_size of 3, so when one of the OSDs goes down, the pool doenst perform any IO, manually changing the the pool min_size to 2 worked great.


El 24/10/2017 a las 10:13, Jorge Pinilla López escribió:
I am testing erasure code pools and doing a rados test write to try fault tolerace.
I have 3 Nodes with 1 OSD each, K=2 M=1.

While performing the write (rados bench -p replicate 100 write), I stop one of the OSDs daemons (example osd.0), simulating a node fail, and then the hole write stops and I can't write any data anymore.

    1      16        28        12   46.8121        48     1.01548    0.616034
    2      16        40        24   47.3907        48     1.04219    0.923728
    3      16        52        36   47.5889        48    0.593145      1.0038
    4      16        68        52   51.6633        64     1.39638     1.08098
    5      16        74        58    46.158        24     1.02699     1.10172
    6      16        83        67   44.4711        36     3.01542     1.18012
    7      16        95        79   44.9722        48    0.776493     1.24003
    8      16        95        79   39.3681         0           -     1.24003
    9      16        95        79   35.0061         0           -     1.24003
   10      16        95        79   31.5144         0           -     1.24003
   11      16        95        79   28.6561         0           -     1.24003
   12      16        95        79   26.2732         0           -     1.24003

Its pretty clear where the OSD failed

On the other hand, using a replicated pool, the client (rados test) doesnt even notice the OSD fail, which is awesome.

Is this a normal behaviour on EC pools?

Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A


--

Jorge Pinilla López
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux