Re: Blocked requests activating+remapped afterextendingpg(p)_num

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 17 May 2018 13:30:51 +0200

Hi,

On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
Hi!

Today I added some new OSDs (nearly doubled) to my luminous cluster.
I then changed pg(p)_num from 256 to 1024 for that pool because it was
complaining about to few PGs. (I noticed that should better have been small
changes).

This is the current status:

     health: HEALTH_ERR
             336568/1307562 objects misplaced (25.740%)
             Reduced data availability: 128 pgs inactive, 3 pgs peering, 1
pg stale
             Degraded data redundancy: 6985/1307562 objects degraded
(0.534%), 19 pgs degraded, 19 pgs undersized
             107 slow requests are blocked > 32 sec
             218 stuck requests are blocked > 4096 sec

   data:
     pools:   2 pools, 1536 pgs
     objects: 638k objects, 2549 GB
     usage:   5210 GB used, 11295 GB / 16506 GB avail
     pgs:     0.195% pgs unknown
              8.138% pgs not active
              6985/1307562 objects degraded (0.534%)
              336568/1307562 objects misplaced (25.740%)
              855 active+clean
              517 active+remapped+backfill_wait
              107 activating+remapped
              31  active+remapped+backfilling
              15  activating+undersized+degraded+remapped
              4   active+undersized+degraded+remapped+backfilling
              3   unknown
              3   peering
              1   stale+active+clean

You need to resolve the unknown/peering/activating pgs first. You have 
1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25 
OSDs and the heterogenous host sizes, I assume that some OSDs hold more 
than 200 PGs. There's a threshold for the number of PGs; reaching this 
threshold keeps the OSDs from accepting new PGs.

Try to increase the threshold  (mon_max_pg_per_osd / 
max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure 
about the exact one, consult the documentation) to allow more PGs on the 
OSDs. If this is the cause of the problem, the peering and activating 
states should be resolved within a short time.

You can also check the number of PGs per OSD with 'ceph osd df'; the 
last column is the current number of PGs.

OSD tree:

ID  CLASS WEIGHT   TYPE NAME                     STATUS REWEIGHT PRI-AFF
  -1       16.12177 root default
-16       16.12177     datacenter dc01
-19       16.12177         pod dc01-agg01
-10        8.98700             rack dc01-rack02
  -4        4.03899                 host node1001
   0   hdd  0.90999                     osd.0         up  1.00000 1.00000
   1   hdd  0.90999                     osd.1         up  1.00000 1.00000
   5   hdd  0.90999                     osd.5         up  1.00000 1.00000
   2   ssd  0.43700                     osd.2         up  1.00000 1.00000
   3   ssd  0.43700                     osd.3         up  1.00000 1.00000
   4   ssd  0.43700                     osd.4         up  1.00000 1.00000
  -7        4.94899                 host node1002
   9   hdd  0.90999                     osd.9         up  1.00000 1.00000
  10   hdd  0.90999                     osd.10        up  1.00000 1.00000
  11   hdd  0.90999                     osd.11        up  1.00000 1.00000
  12   hdd  0.90999                     osd.12        up  1.00000 1.00000
   6   ssd  0.43700                     osd.6         up  1.00000 1.00000
   7   ssd  0.43700                     osd.7         up  1.00000 1.00000
   8   ssd  0.43700                     osd.8         up  1.00000 1.00000
-11        7.13477             rack dc01-rack03
-22        5.38678                 host node1003
  17   hdd  0.90970                     osd.17        up  1.00000 1.00000
  18   hdd  0.90970                     osd.18        up  1.00000 1.00000
  24   hdd  0.90970                     osd.24        up  1.00000 1.00000
  26   hdd  0.90970                     osd.26        up  1.00000 1.00000
  13   ssd  0.43700                     osd.13        up  1.00000 1.00000
  14   ssd  0.43700                     osd.14        up  1.00000 1.00000
  15   ssd  0.43700                     osd.15        up  1.00000 1.00000
  16   ssd  0.43700                     osd.16        up  1.00000 1.00000
-25        1.74799                 host node1004
  19   ssd  0.43700                     osd.19        up  1.00000 1.00000
  20   ssd  0.43700                     osd.20        up  1.00000 1.00000
  21   ssd  0.43700                     osd.21        up  1.00000 1.00000
  22   ssd  0.43700                     osd.22        up  1.00000 1.00000

Crush rule is set to chooseleaf rack and (temporary!) to size 2.
Why are PGs stuck in peering and activating?
"ceph df" shows that only 1,5TB are used on the pool, residing on the hdd's
- which would perfectly fit the crush rule....(?)

Size 2 within the crush rule or size 2 for the two pools?

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com