Re: Ceph OSD crash starting up

Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> · Tue, 19 Sep 2017 17:28:31 +0200



    Hi David, 

      
    What I want is to add
        the OSD back with its data yes. But avoiding any troubles that
        can happen from the time it was out. 

      
    Is it possible? I
        suppose that some pg has been updated after. Will ceph manage it
        gracefully?
    Ceph status is getting
        worse every day.

      
    ceph status

            cluster 9028f4da-0d77-462b-be9b-dbdf7fa57771

             health HEALTH_ERR

                    6 pgs inconsistent

                    31 scrub errors

                    too many PGs per OSD (305 > max 300)

             monmap e12: 2 mons at
        {blue-compute=172.16.0.119:6789/0,red-compute=172.16.0.100:6789/0}

                    election epoch 4328, quorum 0,1
        red-compute,blue-compute

              fsmap e881: 1/1/1 up {0=blue-compute=up:active}

             osdmap e7120: 5 osds: 5 up, 5 in

                    flags require_jewel_osds

              pgmap v66976120: 764 pgs, 6 pools, 555 GB data, 140
        kobjects

                    1111 GB used, 3068 GB / 4179 GB avail

                         758 active+clean

                           6 active+clean+inconsistent

          client io 384 kB/s wr, 0 op/s rd, 83 op/s wr

      
    I want to add the old
        OSD, rebalance copies are more hosts/osds and remove it out
        again. 

      
    Best regards,

      
    On 19/09/17 14:47, David Turner wrote:

    
      Are you asking to add the osd back with its data or
        add it back in as a fresh osd.  What is your `ceph status`?
      

        On Tue, Sep 19, 2017, 5:23 AM Gonzalo Aguilar
          Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
          wrote:

        
            Hi David, 

              
            Thank you for
                the great explanation of the weights, I thought that
                ceph was adjusting them based on disk. But it seems it's
                not. 

              
            But the problem
                was not that I think the node was failing because a
                software bug because the disk was not full anymeans. 

              
            /dev/sdb1                     976284608 172396756  
              803887852  18% /var/lib/ceph/osd/ceph-1

            
            Now the question is to know if I can add again this osd
              safely. Is it possible?
            Best regards,

            
            On
              14/09/17 23:29, David Turner wrote:

            
              Your weights should more closely represent
                the size of the OSDs.  OSD3 and OSD6 are weighted
                properly, but your other 3 OSDs have the same weight
                even though OSD0 is twice the size of OSD2 and OSD4.
                

                Your OSD weights is what I thought you were
                  referring to when you said you set the crush map to
                  1.  At some point it does look like you set all of
                  your OSD weights to 1, which would apply to OSD1.  If
                  the OSD was too small for that much data, it would
                  have filled up and be too full to start.  Can you
                  mount that disk and see how much free space is on it?
                

                Just so you understand what that weight is, it is
                  how much data the cluster is going to put on it.  The
                  default is for the weight to be the size of the OSD in
                  TiB (1024 based instead of TB which is 1000).  If you
                  set the weight of a 1TB disk and a 4TB disk both to 1,
                  then the cluster will try and give them the same
                  amount of data.  If you set the 4TB disk to a weight
                  of 4, then the cluster will try to give it 4x more
                  data than the 1TB drive (usually what you want).
                

                In your case, your 926G OSD0 has a weight of 1 and
                  your 460G OSD2 has a weight of 1 so the cluster thinks
                  they should each receive the same amount of data
                  (which it did, they each have ~275GB of data).  OSD3
                  has a weight of 1.36380 (its size in TiB) and OSD6 has
                  a weight of 0.90919 and they have basically the same
                  %used space (17%) as opposed to the same amount of
                  data because the weight is based on their size.
                

                As long as you had enough replicas of your data in
                  the cluster for it to recover from you removing OSD1
                  such that your cluster is health_ok without any
                  missing objects, then there is nothing that you need
                  off of OSD1 and ceph recovered from the lost disk
                  successfully.
              
              
                On Thu, Sep 14, 2017 at 4:39 PM Gonzalo
                  Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
                  wrote:

                
                    Hello,
                        

                    I was
                        on a old version of ceph. And it showed a
                        warning saying:
                    crush
                          map has straw_calc_version=0
                    I
                        rode that adjusting it will only rebalance all
                        so admin should select when to do it. So I went
                        straigth and ran:
                    

                      ceph osd crush
                          tunables optimal
                    

                        It rebalanced as it said but then I started
                        to have lots of pg wrong. I discovered that it
                        was because my OSD1. I thought it was disk
                        faillure so I added a new OSD6 and system
                        started to rebalance. Anyway OSD was not
                        starting.
                    I
                        thought to wipe it all. But I preferred to leave
                        disk as it was, and journal intact, in case I
                        can recover and get data from it. (See mail:
                         Scrub failing all the time, new
                        inconsistencies keep appearing). 

                      
                        So here's the information. But it has OSD1
                        replaced by OSD3, sorry. 

                      
                    ID
                        WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR 
                        PGS 

                         0 1.00000  1.00000  926G  271G  654G 29.34 1.10
                        369 

                         2 1.00000  1.00000  460G  284G  176G 61.67 2.32
                        395 

                         4 1.00000  1.00000  465G  151G  313G 32.64 1.23
                        214 

                         3 1.36380  1.00000 1396G  239G 1157G 17.13 0.64
                        340 

                         6 0.90919  1.00000  931G  164G  766G 17.70 0.67
                        210 

                                      TOTAL 4179G 1111G 3067G
                        26.60          

                        MIN/MAX VAR: 0.64/2.32  STDDEV: 16.99

                        
                    As I said I still have OSD1 intact so I can do
                    whatever you need except readding to the cluster.
                    Since I don't know what It will do, maybe cause
                    havok.

                    Best regards,
                  

                    On
                      14/09/17 17:12, David Turner wrote:

                    
                      What do you mean by "updated
                          crush map to 1"?  Can you please provide a
                          copy of your crush map and `ceph osd df`?
                      

                        On Wed, Sep 13, 2017 at 6:39 AM
                          Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx>
                          wrote:

                        
                            Hi,
                                

                            I'recently
                                updated crush map to 1 and did all
                                relocation of the pgs. At the end I
                                found that one of the OSD is not
                                starting. 

                              
                            This
                                is what it shows:
                            

                            2017-09-13 10:37:34.287248 7f49cbe12700
                              -1 *** Caught signal (Aborted) **

                               in thread 7f49cbe12700
                              thread_name:filestore_sync

                              
                               ceph version 10.2.7
                              (50e863e0f4bc8f4b9e31156de690d765af245185)

                               1: (()+0x9616ee) [0xa93c6ef6ee]

                               2: (()+0x11390) [0x7f49d9937390]

                               3: (gsignal()+0x38) [0x7f49d78d3428]

                               4: (abort()+0x16a) [0x7f49d78d502a]

                               5: (ceph::__ceph_assert_fail(char const*,
                              char const*, int, char const*)+0x26b)
                              [0xa93c7ef43b]

                               6: (FileStore::sync_entry()+0x2bbb)
                              [0xa93c47fcbb]

                               7: (FileStore::SyncThread::entry()+0xd)
                              [0xa93c4adcdd]

                               8: (()+0x76ba) [0x7f49d992d6ba]

                               9: (clone()+0x6d) [0x7f49d79a53dd]

                               NOTE: a copy of the executable, or
                              `objdump -rdS <executable>` is
                              needed to interpret this.

                              
                              --- begin dump of recent events ---

                                  -3> 2017-09-13 10:37:34.253808
                              7f49dac6e8c0  5 osd.1 pg_epoch: 6293
                              pg[10.8c( v 6220'575937
                              (4942'572901,6220'575937] local-les=6235
                              n=282 ec=419 les/c/f 6235/6235/0
                              6293/6293/6290) [1,2]/[2] r=-1 lpr=0
                              pi=6234-6292/24 crt=6220'575937 lcod 0'0
                              inactive NOTIFY NIBBLEWISE] exit Initial
                              0.029683 0 0.000000

                                  -2> 2017-09-13 10:37:34.253848
                              7f49dac6e8c0  5 osd.1 pg_epoch: 6293
                              pg[10.8c( v 6220'575937
                              (4942'572901,6220'575937] local-les=6235
                              n=282 ec=419 les/c/f 6235/6235/0
                              6293/6293/6290) [1,2]/[2] r=-1 lpr=0
                              pi=6234-6292/24 crt=6220'575937 lcod 0'0
                              inactive NOTIFY NIBBLEWISE] enter Reset

                                  -1> 2017-09-13 10:37:34.255018
                              7f49dac6e8c0  5 osd.1 pg_epoch: 6293
                              pg[10.90(unlocked)] enter Initial

                                   0> 2017-09-13 10:37:34.287248
                              7f49cbe12700 -1 *** Caught signal
                              (Aborted) **

                               in thread 7f49cbe12700
                              thread_name:filestore_sync

                              
                               ceph version 10.2.7
                              (50e863e0f4bc8f4b9e31156de690d765af245185)

                               1: (()+0x9616ee) [0xa93c6ef6ee]

                               2: (()+0x11390) [0x7f49d9937390]

                               3: (gsignal()+0x38) [0x7f49d78d3428]

                               4: (abort()+0x16a) [0x7f49d78d502a]

                               5: (ceph::__ceph_assert_fail(char const*,
                              char const*, int, char const*)+0x26b)
                              [0xa93c7ef43b]

                               6: (FileStore::sync_entry()+0x2bbb)
                              [0xa93c47fcbb]

                               7: (FileStore::SyncThread::entry()+0xd)
                              [0xa93c4adcdd]

                               8: (()+0x76ba) [0x7f49d992d6ba]

                               9: (clone()+0x6d) [0x7f49d79a53dd]

                               NOTE: a copy of the executable, or
                              `objdump -rdS <executable>` is
                              needed to interpret this.

                              
                              --- logging levels ---

                                 0/ 5 none

                                 0/ 1 lockdep

                                 0/ 1 context

                                 1/ 1 crush

                                 1/ 5 mds

                                 1/ 5 mds_balancer

                                 1/ 5 mds_locker

                                 1/ 5 mds_log

                                 1/ 5 mds_log_expire

                                 1/ 5 mds_migrator

                                 0/ 1 buffer

                                 0/ 1 timer

                                 0/ 1 filer

                                 0/ 1 striper

                                 0/ 1 objecter

                                 0/ 5 rados

                                 0/ 5 rbd

                                 0/ 5 rbd_mirror

                                 0/ 5 rbd_replay

                                 0/ 5 journaler

                                 0/ 5 objectcacher

                                 0/ 5 client

                                 0/ 5 osd

                                 0/ 5 optracker

                                 0/ 5 objclass

                                 1/ 3 filestore

                                 1/ 3 journal

                                 0/ 5 ms

                                 1/ 5 mon

                                 0/10 monc

                                 1/ 5 paxos

                                 0/ 5 tp

                                 1/ 5 auth

                                 1/ 5 crypto

                                 1/ 1 finisher

                                 1/ 5 heartbeatmap

                                 1/ 5 perfcounter

                                 1/ 5 rgw

                                 1/10 civetweb

                                 1/ 5 javaclient

                                 1/ 5 asok

                                 1/ 1 throttle

                                 0/ 0 refs

                                 1/ 5 xio

                                 1/ 5 compressor

                                 1/ 5 newstore

                                 1/ 5 bluestore

                                 1/ 5 bluefs

                                 1/ 3 bdev

                                 1/ 5 kstore

                                 4/ 5 rocksdb

                                 4/ 5 leveldb

                                 1/ 5 kinetic

                                 1/ 5 fuse

                                -2/-2 (syslog threshold)

                                -1/-1 (stderr threshold)

                                max_recent     10000

                                max_new         1000

                                log_file /var/log/ceph/ceph-osd.1.log

                              --- end dump of recent events ---

                            
                            Is there any way to recover it or should
                              I open a bug?
                            

                            Best regards

                            
_______________________________________________

                          ceph-users mailing list

                          ceph-users@xxxxxxxxxxxxxx

                          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                        
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com