Re: Accidentally Remove OSDs

FaHui Lin <fahui.lin@xxxxxxxxxx> · Fri, 24 Apr 2015 13:11:10 +0800



    Dear Robert,

    
    Yes, you're right. The two OSDs removed of the PGs are from the same
    host and contradict to my rules (that's a reason I removed them).
    Unfortunately the partitions of the disk are all formatted so I
    cannot recover the data.

    
    However, the command "ceph pg force_create_pg <pg ID>" and
    restarting the OSD daemons works to clean stale pgs. Now my ceph
    health is OK and the rbd service can work normally.

    
    Many thanks for your help,

    FaHui

    
    Robert LeBlanc 於 2015/4/24 上午 10:08 寫道:

    
      What hosts were those OSDS on? I'm concerned that two
        OSDS for some of the PGS were adjacent and if that placed them
        on the same host, it would be contrary to your rules and
        something deeper is wrong. 
      Did you format the disks that were taken out of the
        cluster? Can you mount the partitions and see the files and
        directories? If so, you can probably recover the data using the
        tools from the recovery/dev tools. 
      You may be able to force create the missing PGS using
        ceph force-create <pg.id>. This may or may not work, I
        don't remember. 
      If you just don't care about losing data, you can
        delete the pool and create a new one. This should work for sure,
        but losses any data that you might have still had. If this pool
        was full of RBD, then there is a high possibility that all of
        your RBD images had chunks in the missing PGs. If you choose not
        to try to restore the PGS using the tools,  I'd be inclined to
        delete the pool and restore from back up as to not be surprised
        by data corruption in the images. Neither option is ideal or
        quick. 
      Robert LeBlanc
      Sent from a mobile device please excuse any typos.
      On Apr 23, 2015 6:42 PM, "FaHui Lin" <fahui.lin@xxxxxxxxxx>
        wrote:

        
           Hi, thank you for your
            response.

            
            Well, I've not only taken out but also totally removed the
            both OSDs (by "ceph osd rm" and delete everything in
            /var/lib/ceph/osd/<related OSDs>) of that pg (and
            similar to all other stale pgs.)

            
            The main problem I have is those stale pgs (miss all OSDs
            I've removed) not merely make ceph health warning, but other
            machine cannot mount the ceph rbd as well.

            
            Here's the full crush map.  The OSDs I removed were
            osd.5~19.

            # begin crush map

                  tunable choose_local_tries 0

                  tunable choose_local_fallback_tries 0

                  tunable choose_total_tries 500

                  
                  # devices

                  device 0 osd.0

                  device 1 device1

                  device 2 osd.2

                  device 3 osd.3

                  device 4 osd.4

                  device 5 device5

                    device 6 device6

                    device 7 device7

                    device 8 device8

                    device 9 device9

                    device 10 device10

                    device 11 device11

                    device 12 device12

                    device 13 device13

                    device 14 device14

                    device 15 device15

                    device 16 device16

                    device 17 device17

                    device 18 device18

                    device 19 device19

                  device 20 osd.20

                  device 21 osd.21

                  device 22 osd.22

                  device 23 osd.23

                  device 24 osd.24

                  device 25 osd.25

                  device 26 osd.26

                  device 27 osd.27

                  
                  # types

                  type 0 osd

                  type 1 host

                  type 2 rack

                  type 3 row

                  type 4 room

                  type 5 datacenter

                  type 6 root

                  
                  # buckets

                  host XX-ceph01 {

                          id -2           # do not change unnecessarily

                          # weight 160.040

                          alg straw

                          hash 0  # rjenkins1

                          item osd.0 weight 40.010

                          item osd.2 weight 40.010

                          item osd.3 weight 40.010

                          item osd.4 weight 40.010

                  }

                  host XX-ceph02 {

                          id -3           # do not change unnecessarily

                          # weight 320.160

                          alg straw

                          hash 0  # rjenkins1

                          item osd.20 weight 40.020

                          item osd.21 weight 40.020

                          item osd.22 weight 40.020

                          item osd.23 weight 40.020

                          item osd.24 weight 40.020

                          item osd.25 weight 40.020

                          item osd.26 weight 40.020

                          item osd.27 weight 40.020

                  }

                  root default {

                          id -1           # do not change unnecessarily

                          # weight 480.200

                          alg straw

                          hash 0  # rjenkins1

                          item XX-ceph01 weight 160.040

                          item XX-ceph02 weight 320.160

                  }

                  
                  # rules

                  rule data {

                          ruleset 0

                          type replicated

                          min_size 1

                          max_size 10

                          step take default

                          step chooseleaf firstn 0 type host

                          step emit

                  }

                  rule metadata {

                          ruleset 1

                          type replicated

                          min_size 1

                          max_size 10

                          step take default

                          step chooseleaf firstn 0 type host

                          step emit

                  }

                  rule rbd {

                          ruleset 2

                          type replicated

                          min_size 1

                          max_size 10

                          step take default

                          step chooseleaf firstn 0 type host

                          step emit

                  }

                  
                  # end crush map

            
            List of some stale pgs:

            pg_stat objects mip     degr    misp   
                  unf     bytes   log     disklog state  
                  state_stamp     v       reported        up     
                  up_primary      acting  acting_primary 
                  last_scrub      scrub_stamp     last_deep_scrub
                  deep_scrub_stamp

                  17.c6   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:09.358613      0'0    
                  2706:216        [19,13] 19      [19,13] 19     
                  0'0     2015-04-16 02:29:34.882038

                        0'0     2015-04-16 02:29:34.882038

                  17.c7   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:28.304621      0'0    
                  2718:262        [15,18] 15      [15,18] 15     
                  0'0     2015-04-20 09:15:39.363310

                        0'0     2015-04-20 09:15:39.363310

                  17.c1   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:01.073681      0'0    
                  2706:199        [19,16] 19      [19,16] 19     
                  0'0     2015-04-15 12:37:11.741251

                        0'0     2015-04-15 12:37:11.741251

                  17.de   0       0       0      
                  0       0       0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:29.436796      0'0     2718:267        [15]   
                  15      [15]    15      0'0     2015-04-13
                  07:56:01.760824      0'0     2015-04-13
                  07:56:01.760824

                  17.da   0       0       0       0       0      
                  0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:50.001087      0'0     2718:232        [14]   
                  14      [14]    14      0'0     2015-04-19
                  15:45:53.304596      0'0     2015-04-19
                  15:45:53.304596

                  17.d9   0       0       0       0       0      
                  0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:29.472983      0'0     2718:270        [14]   
                  14      [14]    14      0'0     2015-04-16
                  01:55:44.183550      0'0     2015-04-16
                  01:55:44.183550

                  17.d7   0       0       0       0       0      
                  0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:53.839134      0'0     2718:68 [17]    17     
                  [17]    17      0'0     2015-04-16
                  00:06:27.998210      0'0     2015-04-16
                  00:06:27.998210

                  17.d5   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:28.311352      0'0    
                  2718:226        [18,17] 18      [18,17] 18     
                  0'0     2015-04-15 20:52:33.372369

                        0'0     2015-04-15 20:52:33.372369

                  17.d0   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:24.850188      0'0    
                  2718:213        [15,12] 15      [15,12] 15     
                  0'0     2015-04-19 15:40:32.215234

                        0'0     2015-04-19 15:40:32.215234

                  17.d1   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:24.849996      0'0    
                  2718:227        [15,12] 15      [15,12] 15     
                  0'0     2015-04-15 19:03:38.137147

                        0'0     2015-04-15 19:03:38.137147

                  17.ae   0       0       0      
                  0       0       0       0       0      
                  stale+active+clean      2015-04-20
                  09:16:28.310506      0'0     2718:231        [18,12]
                  18      [18,12] 18      0'0     2015-04-16
                  02:23:35.031329

                        0'0     2015-04-16 02:23:35.031329

                  17.ac   0       0       0      
                  0       0       0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:50.002406      0'0     2718:66 [12]    12     
                  [12]    12      0'0     2015-04-16
                  02:23:33.023476      0'0     2015-04-16
                  02:23:33.023476

                  17.aa   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:25.983034      0'0    
                  2718:213        [15,14] 15      [15,14] 15     
                  0'0     2015-04-19 15:32:38.896039

                        0'0     2015-04-19 15:32:38.896039

                  17.ab   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:24.836133      0'0    
                  2718:260        [12,17] 12      [12,17] 12     
                  0'0     2015-04-19 15:32:44.905707

                        0'0     2015-04-19 15:32:44.905707

                  17.a8   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:09.361319      0'0    
                  2706:212        [19,13] 19      [19,13] 19     
                  0'0     2015-04-16 02:23:32.026015

                        0'0     2015-04-16 02:23:32.026015

                  17.a6   0       0       0       0       0      
                  0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:50.002804      0'0     2718:96 [18]    18     
                  [18]    18      0'0     2015-04-20
                  14:02:29.334181      0'0     2015-04-20
                  14:02:29.334181

                  17.a4   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:28.310707      0'0    
                  2718:232        [18,17] 18      [18,17] 18     
                  0'0     2015-04-16 02:22:12.018136

                        0'0     2015-04-16 02:22:12.018136

                  17.a2   0       0       0       0       0      
                  0       0       0       stale+active+clean     
                  2015-04-20 09:16:11.624952      0'0    
                  2718:200        [15,17] 15      [15,17] 15     
                  0'0     2015-04-15 10:42:37.880699

                        0'0     2015-04-15 10:42:37.880699

                  17.a0   0       0       0       0       0      
                  0       0       0      
                  stale+active+undersized+degraded        2015-04-20
                  23:41:29.469600      0'0     2718:66 [18]    18     
                  [18]    18      0'0     2015-04-16
                  02:22:08.992748      0'0     2015-04-16
                  02:22:08.992748

                
            OSDs of those pgs (either primary or secondary) are totally
            gone, and I cannot find a way to repair them.

            
            I've had another machince of new drive partitions, and I
            tried to re-create OSDs I had removed on it, but that would
            be osd.28, 29, etc. That's why I wondered how to change ID
            number of an OSD.

            
            Regardless of the data loss (which I think it's already
            happened), I'd like to make the ceph service normal asap.

            Is there anyway to deal with those stale pgs? (such as to
            recreate the OSDs they need, or to inject exsisting OSDs to
            those pgs, or even to kill those pgs?)

            And since I'm not experienced, I may need more concrete
            comments (i.e. approach with ceph commands). Many thaks for
            your help.

            
            Best Regards,

            FaHui

            
            Robert LeBlanc 於 2015/4/23 下午 10:53 寫道:

            
              A full CRUSH dump would be helpful, as well
                as knowing which OSDs you took out. If you didn't take
                17 out as well as 15, then you might be OK. If the OSDs
                still show up in your CRUSH, then try and remove them
                from the CRSH map with 'ceph osd crush rm osd.15'.
                

                If you took out both OSDs, you will need to use
                  some of the recovery tools. I believe the procedure is
                  roughly, mount the drive in another box, extract the
                  PGs needed, then shut down the primary OSD for that
                  PG, inject the PG into the OSD, then start it up and
                  it should replicate. I haven't done it myself
                  (probably something I should do in case I ever run
                  into the problem).
              
              
                On Thu, Apr 23, 2015 at 2:00
                  AM, FaHui Lin <fahui.lin@xxxxxxxxxx>
                  wrote:

                  
                     Dear Ceph
                      experts,

                      
                      I'm a very new Ceph user. I made a blunder that I
                      removed some OSDs (and all files in the related
                      directories) before Ceph finished rebalancing
                      datas and migrating pgs.

                      
                      Not to mention the data loss, I meet the problem
                      that:

                      
                      1) There are always stale pgs showing in ceph
                      status (with heath warning). Say one of the stale
                      pg 17.a2:

                      # ceph -v

                            ceph version 0.87.1
                            (283c2e7cfa2457799f534744d7d549f83ea1335e)

                            
                            # ceph -s

                                cluster
                            3f81b47e-fb15-4fbb-9fee-0b1986dfd7ea

                                 health HEALTH_WARN 203 pgs degraded;
                            366 pgs stale; 203 pgs stuck degraded; 366
                              pgs stuck stale; 203 pgs stuck
                            unclean; 203 pgs stuck undersized; 203 pgs
                            undersized; 154 requests are blocked > 32
                            sec; recovery 153738/18991802 objects
                            degraded (0.809%)

                                 monmap e1: 1 mons at {...=...:6789/0},
                            election epoch 1, quorum 0 tw-ceph01

                                 osdmap e3697: 12 osds: 12 up, 12 in

                                  pgmap v21296531: 1156 pgs, 18 pools,
                            36929 GB data, 9273 kobjects

                                        72068 GB used, 409 TB / 480 TB
                            avail

                                        153738/18991802 objects degraded
                            (0.809%)

                                             163 stale+active+clean

                                             786 active+clean

                                             203
                            stale+active+undersized+degraded

                                               4
                            active+clean+scrubbing+deep

                            
                            # ceph pg dump_stuck stale | grep 17.a2

                            17.a2   0       0       0       0      
                            0       0       0       0      
                            stale+active+clean      2015-04-20
                            09:16:11.624952     0'0     2718:200       
                            [15,17] 15      [15,17] 15      0'0    
                            2015-04-15 10:42:37.880699    0'0     
                            2015-04-15 10:42:37.880699

                            
                            # ceph pg repair 17.a2

                            Error EAGAIN: pg 17.a2 primary osd.15 not up

                            
                            # ceph pg scrub 17.a2

                            Error EAGAIN: pg 17.a2 primary osd.15 not up

                            
                            # ceph pg map 17.a2

                            osdmap e3695 pg 17.a2 (17.a2) -> up
                            [27,3] acting [27,3]

                          
                      where osd.15 had already been removed. It seems to
                      map to the existing OSDs ([27, 3]).

                      Can this pg finally get recovered by changing to
                      the existing OSDs? If not, how can I do about this
                      kind of stale pg?

                      
                      2) I tried to solve the problem above by creating
                      OSDs back but failed. The reason was I cannot
                      create an OSD with the same ID to that I removed,
                      say osd.15 (or change the id of an OSD).

                      Is there any way to change the id of an OSD? (By
                      the way, I'm suprised that this issue can hardly
                      be found on the internet.)

                      
                      3) I tried another thing: to dump the crushmap and
                      remove everything (including devices and buckets
                      sections) related to the OSDs I removed. However,
                      after I set the crushmap and dumped it out again,
                      I found the OSDs's line still appear in the
                      devices section (not in the buckets section
                      though), such as:

                      # devices

                            device 0 osd.0

                            device 2 osd.2

                            device 3 osd.3

                            device 4 osd.4

                            device 5 device5

                              ...

                              device 14 device14

                              device 15 device15

                          
                      Is there anyway to remove them? Does it matters
                      when I want to add new OSDs?

                      
                      Please inform me if you have any comments. Thank
                      you.

                      
                      Best Regards,

                      FaHui

                      
                    _______________________________________________

                    ceph-users mailing list

                    ceph-users@xxxxxxxxxxxxxx

                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com