Re: Erasure Pool OSD fail

Jorge Pinilla López <jorpilo@xxxxxxxxx> · Wed, 25 Oct 2017 00:37:21 +0200



    well, you should use M > 1, the more you have, less risk and
      more performance.
    You don't read twice as much data, you read it from different
      sources, further more you can even read less data and have to
      rebuild it, because on erasure pools you don't replicate the data.
    

    On the other hand, the configuration it's not as bad as you
      think, its just different.

    
    3 nodes cluster

    
    Replicate pool size = 2
        -you can take 1 failure, then re-balance and take another
      failure. (max 2 separate)

    
        -you use 2*data space

    
        -you have to write 2*data, full data on one node and full
      data on the second one.

    
    Erasure code pool
        -you can only lose 1 node 

    
        -you use less space
        -as you dont write 2*data, writes are also faster. You write
      half data on one node, half data on the other and parity on
      separate nodes, write work is a lot more distributed.

    
        -reads are slower because you need all the data parts.
    

    On both configurations, if you have corrupted data you lose your
      data, so that's not really a point to compare.
    Replicate pool can achieve way more insensitive read works while
      Erasure pools are thought to perform big writes but really few
      reads.
    

    I have check myself that both configurations can work with a 3
      node cluster so it's not a better and a worse configuration, it
      really depend on your work, and the best thing :) you can have
      both in the same OSDs!

    
    El 24/10/2017 a las 12:37, Eino
      Tuominen escribió:

    
      Hello,

      
      Correct me if I'm wrong, but isn't your configuration
        just twice as bad as running with replication size=2? With
        replication size=2 when you lose a disk you lose data if there
        is even one defect block found when ceph is reconstructing the
        pgs that had a replica on the failed disk. No, with your setup
        you have to be able to read twice as much data correctly in
        order to reconstruct the pgs. When using EC I think that you
        have to use m>1 in production.

      
      -- 

      
        Eino Tuominen

      
        From:
            ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on
            behalf of Jorge Pinilla López <jorpilo@xxxxxxxxx>

            Sent: Tuesday, October 24, 2017 11:24

            To: ceph-users@xxxxxxxxxxxxxx

            Subject: Re:  Erasure Pool OSD fail
           
        
          Okay I think I can respond myself, the pool is created with
            a default min_size of 3, so when one of the OSDs goes down,
            the pool doenst perform any IO, manually changing the the
            pool min_size to 2 worked great.
          

          El 24/10/2017 a las 10:13, Jorge
            Pinilla López escribió:

          
          I am testing erasure code pools and
            doing a rados test write to try fault tolerace.

            I have 3 Nodes with 1 OSD each, K=2 M=1.

            
            While performing the write (rados bench -p replicate 100
            write), I stop one of the OSDs daemons (example osd.0),
            simulating a node fail, and then the hole write stops and I
            can't write any data anymore.

            
                1      16        28        12   46.8121        48    
            1.01548    0.616034

                2      16        40        24   47.3907        48    
            1.04219    0.923728

                3      16        52        36   47.5889        48   
            0.593145      1.0038

                4      16        68        52   51.6633        64    
            1.39638     1.08098

                5      16        74        58    46.158        24    
            1.02699     1.10172

                6      16        83        67   44.4711        36    
            3.01542     1.18012

                7      16        95        79   44.9722        48   
            0.776493     1.24003

                8      16        95        79   39.3681        
            0           -     1.24003

                9      16        95        79   35.0061        
            0           -     1.24003

               10      16        95        79   31.5144        
            0           -     1.24003

               11      16        95        79   28.6561        
            0           -     1.24003

               12      16        95        79   26.2732        
            0           -     1.24003

            
            Its pretty clear where the OSD failed

            
            On the other hand, using a replicated pool, the client
            (rados test) doesnt even notice the OSD fail, which is
            awesome.

            
            Is this a normal behaviour on EC pools?

            
              Jorge Pinilla López

              jorpilo@xxxxxxxxx

              Estudiante de ingenieria informática

              Becario del area de sistemas (SICUZ)

              Universidad de Zaragoza

              PGP-KeyID: 
                A34331932EBC715A

              
            _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          
          -- 

            
            Jorge Pinilla López

            jorpilo@xxxxxxxxx

            Estudiante de ingenieria informática

            Becario del area de sistemas (SICUZ)

            Universidad de Zaragoza

            PGP-KeyID: 
              A34331932EBC715A

            
    -- 

      
      Jorge Pinilla López

      jorpilo@xxxxxxxxx

      Estudiante de ingenieria informática

      Becario del area de sistemas (SICUZ)

      Universidad de Zaragoza

      PGP-KeyID: A34331932EBC715A

      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com