Re: Erasure Pool OSD fail

Jorge Pinilla López <jorpilo@xxxxxxxxx> · Tue, 24 Oct 2017 10:24:12 +0200



    Okay I think I can respond myself, the pool is created with a
      default min_size of 3, so when one of the OSDs goes down, the pool
      doenst perform any IO, manually changing the the pool min_size to
      2 worked great.
    

    El 24/10/2017 a las 10:13, Jorge
      Pinilla López escribió:

    
      I am testing erasure code pools and doing a rados test write to
      try fault tolerace.

      I have 3 Nodes with 1 OSD each, K=2 M=1.

      
      While performing the write (rados bench -p replicate 100 write), I
      stop one of the OSDs daemons (example osd.0), simulating a node
      fail, and then the hole write stops and I can't write any data
      anymore.

      
          1      16        28        12   46.8121        48    
      1.01548    0.616034

          2      16        40        24   47.3907        48    
      1.04219    0.923728

          3      16        52        36   47.5889        48   
      0.593145      1.0038

          4      16        68        52   51.6633        64    
      1.39638     1.08098

          5      16        74        58    46.158        24    
      1.02699     1.10172

          6      16        83        67   44.4711        36    
      3.01542     1.18012

          7      16        95        79   44.9722        48   
      0.776493     1.24003

          8      16        95        79   39.3681         0          
      -     1.24003

          9      16        95        79   35.0061         0          
      -     1.24003

         10      16        95        79   31.5144         0          
      -     1.24003

         11      16        95        79   28.6561         0          
      -     1.24003

         12      16        95        79   26.2732         0          
      -     1.24003

      
      Its pretty clear where the OSD failed

      
      On the other hand, using a replicated pool, the client (rados
      test) doesnt even notice the OSD fail, which is awesome.

      
      Is this a normal behaviour on EC pools?

      
 Jorge Pinilla López

        jorpilo@xxxxxxxxx

        Estudiante de ingenieria informática

        Becario del area de sistemas (SICUZ)

        Universidad de Zaragoza

        PGP-KeyID: A34331932EBC715A

        
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
    -- 

      
 Jorge Pinilla López

      jorpilo@xxxxxxxxx

      Estudiante de ingenieria informática

      Becario del area de sistemas (SICUZ)

      Universidad de Zaragoza

      PGP-KeyID: A34331932EBC715A

      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com