Re: Stuck creating pg

Bart Vanbrabant <bart@xxxxxxxxxxxxx> · Mon, 17 Aug 2015 16:14:28 +0200



    1)

      
      ~# ceph pg 5.6c7 query

      Error ENOENT: i don't have pgid 5.6c7

      
      In the osd log:

      
      2015-08-17 16:11:45.185363 7f311be40700  0 osd.19 64706 do_command
      r=-2 i don't have pgid 5.6c7

      2015-08-17 16:11:45.185380 7f311be40700  0 log_channel(cluster)
      log [INF] : i don't have pgid 5.6c7

      
      2) I do not see anything wrong with this rule:

      
          {

              "rule_id": 0,

              "rule_name": "data",

              "ruleset": 0,

              "type": 1,

              "min_size": 1,

              "max_size": 10,

              "steps": [

                  {

                      "op": "take",

                      "item": -1,

                      "item_name": "default"

                  },

                  {

                      "op": "chooseleaf_firstn",

                      "num": 0,

                      "type": "host"

                  },

                  {

                      "op": "emit"

                  }

              ]

          },

      
      3) I rebooted all machines in the cluster and increased the
      replication level of the affected pool to 3, to be more sure. 
      After recovery of this reboot we are currently in the current
      state:

      
      HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; 103
      requests are blocked > 32 sec; 2 osds have slow requests; pool
      volumes pg_num 2048 > pgp_num 1400

      pg 5.6c7 is stuck inactive since forever, current state creating,
      last acting [19,25,17]

      pg 5.6c7 is stuck unclean since forever, current state creating,
      last acting [19,25,17]

      103 ops are blocked > 524.288 sec

      19 ops are blocked > 524.288 sec on osd.19

      84 ops are blocked > 524.288 sec on osd.25

      2 osds have slow requests

      pool volumes pg_num 2048 > pgp_num 1400

      
      Thanks,

      
      Bart

      
      On 08/17/2015 03:44 PM, minchen wrote:

    
      It looks like the crushrule does't work properly by osdmap
        changed, 
       there are 3 pgs unclean: 5.6c7  5.2c7  15.2bd
      I think you can try follow method to help locate the problem
      1st,  ceph pg <pgid> query to lookup detail of pg
        state,
          eg, blocked by which osd?
      2st, check the crushrule 
          ceph osd crush rule dump 
          and check the crush_ruleset for pools: 5 , 15
          eg,  the chooseleaf may be not choose the right osd ? 
      
         
        minchen

        
        ------------------ Original ------------------
        
          From:  "Bart
            Vanbrabant";<bart@xxxxxxxxxxxxx>;
          Date:  Sun, Aug 16, 2015 07:27 PM
          To:  "ceph-users"<ceph-users@xxxxxxxxxxxxxx>;
            
          Subject:   Stuck creating pg
        
        
        Hi,
          

          I have a ceph cluster with 26 osd's in 4 hosts only use
            for rbd for an OpenStack cluster (started at 0.48 I think),
            currently running 0.94.2 on Ubuntu 14.04. A few days ago one
            of the osd's was at 85% disk usage while only 30% of the raw
            disk space is used. I ran reweight-by-utilization with 150
            was cutoff level. This reshuffled the data. I also noticed
            that the number of pg was still at the level when there were
            less disks in the cluster (1300). 
          

          Based on the current guidelines I increased pg_num to
            2048. It created the placement groups except for the last
            one. To try to force the creation of the pg I removed the
            OSD's (ceph osd out) assigned to that pg but that makes no
            difference. Currently all OSD's are back in and two pg's are
            also stuck in an unclean state:
          

          ceph health detail:

          
            HEALTH_WARN 2 pgs degraded; 2 pgs stale; 2 pgs stuck
              degraded; 1 pgs stuck inactive; 2 pgs stuck stale; 3 pgs
              stuck unclean; 2 pgs stuck undersized; 2 pgs undersized;
              59 requests are blocked > 32 sec; 3 osds have slow
              requests; recovery 221/549658 objects degraded (0.040%);
              recovery 221/549658 objects misplaced (0.040%); pool
              volumes pg_num 2048 > pgp_num 1400
            pg 5.6c7 is stuck inactive since forever, current state
              creating, last acting [19,25]
            pg 5.6c7 is stuck unclean since forever, current state
              creating, last acting [19,25]
            pg 5.2c7 is stuck unclean for 313513.609864, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            pg 15.2bd is stuck unclean for 313513.610368, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            pg 5.2c7 is stuck undersized for 308381.750768, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            pg 15.2bd is stuck undersized for 308381.751913,
              current state stale+active+undersized+degraded+remapped,
              last acting [9]
            pg 5.2c7 is stuck degraded for 308381.750876, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            pg 15.2bd is stuck degraded for 308381.752021, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            pg 5.2c7 is stuck stale for 281750.295301, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            pg 15.2bd is stuck stale for 281750.295293, current
              state stale+active+undersized+degraded+remapped, last
              acting [9]
            16 ops are blocked > 268435 sec
            10 ops are blocked > 134218 sec
            10 ops are blocked > 1048.58 sec
            23 ops are blocked > 524.288 sec
            16 ops are blocked > 268435 sec on osd.1
            8 ops are blocked > 134218 sec on osd.17
            2 ops are blocked > 134218 sec on osd.19
            10 ops are blocked > 1048.58 sec on osd.19
            23 ops are blocked > 524.288 sec on osd.19
            3 osds have slow requests
            recovery 221/549658 objects degraded (0.040%)
            recovery 221/549658 objects misplaced (0.040%)
            pool volumes pg_num 2048 > pgp_num 1400
          
          
          OSD 9 was the one that was the primary when the pg
            creation process got stuck. This OSD has been removed and
            added again (not only osd out but also removed from the
            crush map and added again)
          

          The bad data distribution was probably caused by the low
            number of pg's and mainly bad weighing of the OSD. I changed
            the crush map to give the same weight to each of the OSD's
            but that does not change these problems either:
          

          ceph osd tree:
          
            ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT
              PRIMARY-AFFINITY 
            -1 6.50000 pool default                                
                     
            -6 2.00000     host droplet4                          
                      
            16 0.25000         osd.16         up  1.00000        
               1.00000 
            20 0.25000         osd.20         up  1.00000        
               1.00000 
            21 0.25000         osd.21         up  1.00000        
               1.00000 
            22 0.25000         osd.22         up  1.00000        
               1.00000 
             6 0.25000         osd.6          up  1.00000        
               1.00000 
            18 0.25000         osd.18         up  1.00000        
               1.00000 
            19 0.25000         osd.19         up  1.00000        
               1.00000 
            23 0.25000         osd.23         up  1.00000        
               1.00000 
            -5 1.50000     host droplet3                          
                      
             3 0.25000         osd.3          up  1.00000        
               1.00000 
            13 0.25000         osd.13         up  1.00000        
               1.00000 
            15 0.25000         osd.15         up  1.00000        
               1.00000 
             4 0.25000         osd.4          up  1.00000        
               1.00000 
            25 0.25000         osd.25         up  1.00000        
               1.00000 
            14 0.25000         osd.14         up  1.00000        
               1.00000 
            -2 1.50000     host droplet1                          
                      
             7 0.25000         osd.7          up  1.00000        
               1.00000 
             1 0.25000         osd.1          up  1.00000        
               1.00000 
             0 0.25000         osd.0          up  1.00000        
               1.00000 
             9 0.25000         osd.9          up  1.00000        
               1.00000 
            12 0.25000         osd.12         up  1.00000        
               1.00000 
            17 0.25000         osd.17         up  1.00000        
               1.00000 
            -4 1.50000     host droplet2                          
                      
            10 0.25000         osd.10         up  1.00000        
               1.00000 
             8 0.25000         osd.8          up  1.00000        
               1.00000 
            11 0.25000         osd.11         up  1.00000        
               1.00000 
             2 0.25000         osd.2          up  1.00000        
               1.00000 
            24 0.25000         osd.24         up  1.00000        
               1.00000 
             5 0.25000         osd.5          up  1.00000        
               1.00000
          
          
          I also restarted all OSD's and monitors several times,
            but no change. The pool for which the pg is stuck, has
            replication level 2. I ran out of things to try. Anyone else
            something I can try? 
          

          gr,
          Bart
          

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com