Re: Erasure code failure

Jorge Pinilla López <jorpilo@xxxxxxxxx> · Thu, 19 Oct 2017 23:03:45 +0200



    Yes, I am trying it over luminous.

    
    Well the bug has been going for 8 month and it hasn't been merged
      yet. Idk if that is whats preventing me to make it work. Tomorrow
      I will try to prove it again.

    
    El 19/10/2017 a las 23:00, David Turner
      escribió:

    
      Running a cluster on various versions of Hammer and
        Jewel I haven't had any problems.  I haven't upgraded to
        Luminous quite yet, but I'd be surprised if there is that severe
        of a regression especially since they did so many improvements
        to Erasure Coding.
      

        On Thu, Oct 19, 2017 at 4:59 PM Jorge Pinilla
          López <jorpilo@xxxxxxxxx> wrote:

        
            Well I was trying it some days ago and it didn't work for
              me.
            maybe because of this:
            http://tracker.ceph.com/issues/18749
            https://github.com/ceph/ceph/pull/17619
            I don't know if now it's actually working

            
            El
              19/10/2017 a las 22:55, David Turner escribió:

            
              In a 3 node cluster with EC k=2 m=1, you
                can turn off one of the nodes and the cluster will still
                operate normally.  If you lose a disk during this state
                or another server goes offline, then you lose access to
                your data.  But assuming that you bring up the third
                node and let it finish backfilling/recovering before
                restarting any other nodes, then you're fine.
              

                On Thu, Oct 19, 2017 at 4:49 PM Jorge
                  Pinilla López <jorpilo@xxxxxxxxx>
                  wrote:

                
                   Imagine we have
                    a 3 OSDs cluster and I make an erasure pool with k=2
                    m=1.

                    
                    If I have an OSD fail, we can rebuild the data but
                    (I think) the hole cluster won't be able to perform
                    IOS.

                    
                    Wouldn't be possible to make the cluster work in a
                    degraded mode? 

                    I think it would be a good idea to make the cluster
                    work on degraded mode and promise to re balance/re
                    build whenever a third OSD comes alive. 

                    On reads, it could serve the data using the live
                    data chunks and rebuilding (if necessary) the
                    missing ones(using cpu to calculate the data before
                    serving// with 0 RTA) or trying to rebuild the
                    missing parts so it actually has the 2 data chunks
                    on the 2 live OSDs (with some RTA and space usage)
                    or even doing both things at the same time (with
                    high network and cpu and storage cost).

                    On writes, it could write the 2 data parts into the
                    live OSDs and whenever the third OSD comes up, the
                    cluster could re balance rebuilding the parity chunk
                    and re positioning the parts so all OSDs have the
                    same amount of data/work.

                    
                    would this be possible?

                    
 Jorge Pinilla López

                      jorpilo@xxxxxxxxx

                      Estudiante de ingenieria informática

                      Becario del area de sistemas (SICUZ)

                      Universidad de Zaragoza

                      PGP-KeyID: A34331932EBC715A

                      
                  _______________________________________________

                  ceph-users mailing list

                  ceph-users@xxxxxxxxxxxxxx

                  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                
            -- 

            
 Jorge Pinilla López

              jorpilo@xxxxxxxxx

              Estudiante de ingenieria informática

              Becario del area de sistemas (SICUZ)

              Universidad de Zaragoza

              PGP-KeyID: A34331932EBC715A

              
    -- 

      
      Jorge Pinilla López

      jorpilo@xxxxxxxxx

      Estudiante de ingenieria informática

      Becario del area de sistemas (SICUZ)

      Universidad de Zaragoza

      PGP-KeyID: A34331932EBC715A

      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com