Re: Ceph OSD crash starting up

Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> · Tue, 19 Sep 2017 11:23:29 +0200



    Hi David, 

      
    Thank you for the great
        explanation of the weights, I thought that ceph was adjusting
        them based on disk. But it seems it's not. 

      
    But the problem was not
        that I think the node was failing because a software bug because
        the disk was not full anymeans. 

      
    /dev/sdb1                     976284608 172396756   803887852 
      18% /var/lib/ceph/osd/ceph-1

    
    Now the question is to know if I can add again this osd safely.
      Is it possible?
    Best regards,

    
    On 14/09/17 23:29, David Turner wrote:

    
      Your weights should more closely represent the size
        of the OSDs.  OSD3 and OSD6 are weighted properly, but your
        other 3 OSDs have the same weight even though OSD0 is twice the
        size of OSD2 and OSD4.
        

        Your OSD weights is what I thought you were referring to
          when you said you set the crush map to 1.  At some point it
          does look like you set all of your OSD weights to 1, which
          would apply to OSD1.  If the OSD was too small for that much
          data, it would have filled up and be too full to start.  Can
          you mount that disk and see how much free space is on it?
        

        Just so you understand what that weight is, it is how much
          data the cluster is going to put on it.  The default is for
          the weight to be the size of the OSD in TiB (1024 based
          instead of TB which is 1000).  If you set the weight of a 1TB
          disk and a 4TB disk both to 1, then the cluster will try and
          give them the same amount of data.  If you set the 4TB disk to
          a weight of 4, then the cluster will try to give it 4x more
          data than the 1TB drive (usually what you want).
        

        In your case, your 926G OSD0 has a weight of 1 and your
          460G OSD2 has a weight of 1 so the cluster thinks they should
          each receive the same amount of data (which it did, they each
          have ~275GB of data).  OSD3 has a weight of 1.36380 (its size
          in TiB) and OSD6 has a weight of 0.90919 and they have
          basically the same %used space (17%) as opposed to the same
          amount of data because the weight is based on their size.
        

        As long as you had enough replicas of your data in the
          cluster for it to recover from you removing OSD1 such that
          your cluster is health_ok without any missing objects, then
          there is nothing that you need off of OSD1 and ceph recovered
          from the lost disk successfully.
      
      
        On Thu, Sep 14, 2017 at 4:39 PM Gonzalo Aguilar
          Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
          wrote:

        
            Hello, 

              
            I was on a old
                version of ceph. And it showed a warning saying:
            crush map
                has straw_calc_version=0
            I rode that
                adjusting it will only rebalance all so admin should
                select when to do it. So I went straigth and ran:
            

              ceph osd
                  crush tunables optimal
            

                It rebalanced as it said but then I started to have
                lots of pg wrong. I discovered that it was because my
                OSD1. I thought it was disk faillure so I added a new
                OSD6 and system started to rebalance. Anyway OSD was not
                starting.
            I thought to wipe
                it all. But I preferred to leave disk as it was, and
                journal intact, in case I can recover and get data from
                it. (See mail:  Scrub failing all the time,
                new inconsistencies keep appearing). 

              
                So here's the information. But it has OSD1 replaced by
                OSD3, sorry. 

              
            ID WEIGHT  REWEIGHT
                SIZE  USE   AVAIL %USE  VAR  PGS 

                 0 1.00000  1.00000  926G  271G  654G 29.34 1.10 369 

                 2 1.00000  1.00000  460G  284G  176G 61.67 2.32 395 

                 4 1.00000  1.00000  465G  151G  313G 32.64 1.23 214 

                 3 1.36380  1.00000 1396G  239G 1157G 17.13 0.64 340 

                 6 0.90919  1.00000  931G  164G  766G 17.70 0.67 210 

                              TOTAL 4179G 1111G 3067G 26.60          

                MIN/MAX VAR: 0.64/2.32  STDDEV: 16.99

                
            As I said I still have OSD1 intact so I can do whatever you
            need except readding to the cluster. Since I don't know what
            It will do, maybe cause havok.

            Best regards,
          

            On
              14/09/17 17:12, David Turner wrote:

            
              What do you mean by "updated
                  crush map to 1"?  Can you please provide a copy of
                  your crush map and `ceph osd df`?
              

                On Wed, Sep 13, 2017 at 6:39 AM Gonzalo
                  Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
                  wrote:

                
                    Hi, 

                      
                    I'recently
                        updated crush map to 1 and did all relocation of
                        the pgs. At the end I found that one of the OSD
                        is not starting. 

                      
                    This is
                        what it shows:
                    

                    2017-09-13 10:37:34.287248 7f49cbe12700 -1 ***
                      Caught signal (Aborted) **

                       in thread 7f49cbe12700 thread_name:filestore_sync

                      
                       ceph version 10.2.7
                      (50e863e0f4bc8f4b9e31156de690d765af245185)

                       1: (()+0x9616ee) [0xa93c6ef6ee]

                       2: (()+0x11390) [0x7f49d9937390]

                       3: (gsignal()+0x38) [0x7f49d78d3428]

                       4: (abort()+0x16a) [0x7f49d78d502a]

                       5: (ceph::__ceph_assert_fail(char const*, char
                      const*, int, char const*)+0x26b) [0xa93c7ef43b]

                       6: (FileStore::sync_entry()+0x2bbb)
                      [0xa93c47fcbb]

                       7: (FileStore::SyncThread::entry()+0xd)
                      [0xa93c4adcdd]

                       8: (()+0x76ba) [0x7f49d992d6ba]

                       9: (clone()+0x6d) [0x7f49d79a53dd]

                       NOTE: a copy of the executable, or `objdump -rdS
                      <executable>` is needed to interpret this.

                      
                      --- begin dump of recent events ---

                          -3> 2017-09-13 10:37:34.253808
                      7f49dac6e8c0  5 osd.1 pg_epoch: 6293 pg[10.8c( v
                      6220'575937 (4942'572901,6220'575937]
                      local-les=6235 n=282 ec=419 les/c/f 6235/6235/0
                      6293/6293/6290) [1,2]/[2] r=-1 lpr=0
                      pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive
                      NOTIFY NIBBLEWISE] exit Initial 0.029683 0
                      0.000000

                          -2> 2017-09-13 10:37:34.253848
                      7f49dac6e8c0  5 osd.1 pg_epoch: 6293 pg[10.8c( v
                      6220'575937 (4942'572901,6220'575937]
                      local-les=6235 n=282 ec=419 les/c/f 6235/6235/0
                      6293/6293/6290) [1,2]/[2] r=-1 lpr=0
                      pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive
                      NOTIFY NIBBLEWISE] enter Reset

                          -1> 2017-09-13 10:37:34.255018
                      7f49dac6e8c0  5 osd.1 pg_epoch: 6293
                      pg[10.90(unlocked)] enter Initial

                           0> 2017-09-13 10:37:34.287248 7f49cbe12700
                      -1 *** Caught signal (Aborted) **

                       in thread 7f49cbe12700 thread_name:filestore_sync

                      
                       ceph version 10.2.7
                      (50e863e0f4bc8f4b9e31156de690d765af245185)

                       1: (()+0x9616ee) [0xa93c6ef6ee]

                       2: (()+0x11390) [0x7f49d9937390]

                       3: (gsignal()+0x38) [0x7f49d78d3428]

                       4: (abort()+0x16a) [0x7f49d78d502a]

                       5: (ceph::__ceph_assert_fail(char const*, char
                      const*, int, char const*)+0x26b) [0xa93c7ef43b]

                       6: (FileStore::sync_entry()+0x2bbb)
                      [0xa93c47fcbb]

                       7: (FileStore::SyncThread::entry()+0xd)
                      [0xa93c4adcdd]

                       8: (()+0x76ba) [0x7f49d992d6ba]

                       9: (clone()+0x6d) [0x7f49d79a53dd]

                       NOTE: a copy of the executable, or `objdump -rdS
                      <executable>` is needed to interpret this.

                      
                      --- logging levels ---

                         0/ 5 none

                         0/ 1 lockdep

                         0/ 1 context

                         1/ 1 crush

                         1/ 5 mds

                         1/ 5 mds_balancer

                         1/ 5 mds_locker

                         1/ 5 mds_log

                         1/ 5 mds_log_expire

                         1/ 5 mds_migrator

                         0/ 1 buffer

                         0/ 1 timer

                         0/ 1 filer

                         0/ 1 striper

                         0/ 1 objecter

                         0/ 5 rados

                         0/ 5 rbd

                         0/ 5 rbd_mirror

                         0/ 5 rbd_replay

                         0/ 5 journaler

                         0/ 5 objectcacher

                         0/ 5 client

                         0/ 5 osd

                         0/ 5 optracker

                         0/ 5 objclass

                         1/ 3 filestore

                         1/ 3 journal

                         0/ 5 ms

                         1/ 5 mon

                         0/10 monc

                         1/ 5 paxos

                         0/ 5 tp

                         1/ 5 auth

                         1/ 5 crypto

                         1/ 1 finisher

                         1/ 5 heartbeatmap

                         1/ 5 perfcounter

                         1/ 5 rgw

                         1/10 civetweb

                         1/ 5 javaclient

                         1/ 5 asok

                         1/ 1 throttle

                         0/ 0 refs

                         1/ 5 xio

                         1/ 5 compressor

                         1/ 5 newstore

                         1/ 5 bluestore

                         1/ 5 bluefs

                         1/ 3 bdev

                         1/ 5 kstore

                         4/ 5 rocksdb

                         4/ 5 leveldb

                         1/ 5 kinetic

                         1/ 5 fuse

                        -2/-2 (syslog threshold)

                        -1/-1 (stderr threshold)

                        max_recent     10000

                        max_new         1000

                        log_file /var/log/ceph/ceph-osd.1.log

                      --- end dump of recent events ---

                    
                    Is there any way to recover it or should I open a
                      bug?
                    

                    Best regards

                    
                  _______________________________________________

                  ceph-users mailing list

                  ceph-users@xxxxxxxxxxxxxx

                  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com