Inactive PGs rebuild is not priorized

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Sat, 03 Feb 2018 12:55:17 +0100

Good morning,

after another disk failure, we currently have 7 inactive pgs [1], which
are stalling IO from the affected VMs.

It seems that ceph, when rebuilding does not focus on repairing
the inactive PGs first, which surprised us quite a lot:

It does not repair the inactive first, but mixes inactive with
active+undersized+degraded+remapped+backfill_wait.

Is this a misconfiguration on our side or a design aspect of ceph?

I have attached ceph -s from three times while rebuilding below.

First the number of active+undersized+degraded+remapped+backfill_wait.
decreases and much later then
undersized+degraded+remapped+backfill_wait+peered decreases

If anyone could comment on this, I would be very thankful to know how to
progress here, as we had 6 disk failures this week and each time we had
inactive pgs that stalled the VM i/o.

Best,

Nico

[1]
  cluster:
    id:     26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
    health: HEALTH_WARN
            108752/3920931 objects misplaced (2.774%)
            Reduced data availability: 7 pgs inactive
            Degraded data redundancy: 419786/3920931 objects degraded (10.706%), 147 pgs unclean, 140 pgs degraded, 140 pgs und
ersized

  services:
    mon: 3 daemons, quorum server5,server3,server2
    mgr: server5(active), standbys: server3, server2
    osd: 53 osds: 52 up, 52 in; 147 remapped pgs

  data:
    pools:   2 pools, 1280 pgs
    objects: 1276k objects, 4997 GB
    usage:   13481 GB used, 26853 GB / 40334 GB avail
    pgs:     0.547% pgs not active
             419786/3920931 objects degraded (10.706%)
             108752/3920931 objects misplaced (2.774%)
             1133 active+clean
             108  active+undersized+degraded+remapped+backfill_wait
             25   active+undersized+degraded+remapped+backfilling
             7    active+remapped+backfill_wait
             6    undersized+degraded+remapped+backfilling+peered
             1    undersized+degraded+remapped+backfill_wait+peered

  io:
    client:   29980 B/s rd, 1111 kB/s wr, 17 op/s rd, 74 op/s wr
    recovery: 71727 kB/s, 17 objects/s

[2]

[11:20:15] server3:~# ceph -s
  cluster:
    id:     26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
    health: HEALTH_WARN
            103908/3920967 objects misplaced (2.650%)
            Reduced data availability: 7 pgs inactive
            Degraded data redundancy: 380860/3920967 objects degraded (9.713%), 144 pgs unclean, 137 pgs degraded, 137 pgs undersized

  services:
    mon: 3 daemons, quorum server5,server3,server2
    mgr: server5(active), standbys: server3, server2
    osd: 53 osds: 52 up, 52 in; 144 remapped pgs

  data:
    pools:   2 pools, 1280 pgs
    objects: 1276k objects, 4997 GB
    usage:   13630 GB used, 26704 GB / 40334 GB avail
    pgs:     0.547% pgs not active
             380860/3920967 objects degraded (9.713%)
             103908/3920967 objects misplaced (2.650%)
             1136 active+clean
             105  active+undersized+degraded+remapped+backfill_wait
             25   active+undersized+degraded+remapped+backfilling
             7    active+remapped+backfill_wait
             6    undersized+degraded+remapped+backfilling+peered
             1    undersized+degraded+remapped+backfill_wait+peered

  io:
    client:   40201 B/s rd, 1189 kB/s wr, 16 op/s rd, 74 op/s wr
    recovery: 54519 kB/s, 13 objects/s

[3]

  cluster:
    id:     26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
    health: HEALTH_WARN
            88382/3921066 objects misplaced (2.254%)
            Reduced data availability: 4 pgs inactive
            Degraded data redundancy: 285528/3921066 objects degraded (7.282%), 127 pgs unclean
, 121 pgs degraded, 115 pgs undersized
            14 slow requests are blocked > 32 sec

  services:
    mon: 3 daemons, quorum server5,server3,server2
    mgr: server5(active), standbys: server3, server2
    osd: 53 osds: 52 up, 52 in; 121 remapped pgs

  data:
    pools:   2 pools, 1280 pgs
    objects: 1276k objects, 4997 GB
    usage:   14014 GB used, 26320 GB / 40334 GB avail
    pgs:     0.313% pgs not active
             285528/3921066 objects degraded (7.282%)
             88382/3921066 objects misplaced (2.254%)
             1153 active+clean
             78   active+undersized+degraded+remapped+backfill_wait
             33   active+undersized+degraded+remapped+backfilling
             6    active+recovery_wait+degraded
             6    active+remapped+backfill_wait
             2    undersized+degraded+remapped+backfill_wait+peered
             2    undersized+degraded+remapped+backfilling+peered

  io:
    client:   56370 B/s rd, 5304 kB/s wr, 11 op/s rd, 44 op/s wr
    recovery: 37838 kB/s, 9 objects/s

And our tree:

[12:53:57] server4:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME        STATUS REWEIGHT PRI-AFF
-1       39.84532 root default
-6        7.28383     host server1
25   hdd  4.59999         osd.25       up  1.00000 1.00000
48   ssd  0.22198         osd.48       up  1.00000 1.00000
49   ssd  0.22198         osd.49       up  1.00000 1.00000
50   ssd  0.22198         osd.50       up  1.00000 1.00000
51   ssd  0.22699         osd.51       up  1.00000 1.00000
52   ssd  0.22198         osd.52       up  1.00000 1.00000
53   ssd  0.22198         osd.53       up  1.00000 1.00000
54   ssd  0.22198         osd.54       up  1.00000 1.00000
55   ssd  0.22699         osd.55       up  1.00000 1.00000
56   ssd  0.22198         osd.56       up  1.00000 1.00000
57   ssd  0.22198         osd.57       up  1.00000 1.00000
58   ssd  0.22699         osd.58       up  1.00000 1.00000
59   ssd  0.22699         osd.59       up  1.00000 1.00000
-2       11.95193     host server2
21   hdd  4.59999         osd.21       up  1.00000 1.00000
24   hdd  4.59999         osd.24       up  1.00000 1.00000
 0   ssd  0.68799         osd.0        up  1.00000 1.00000
 4   ssd  0.68799         osd.4        up  1.00000 1.00000
 6   ssd  0.68799         osd.6        up  1.00000 1.00000
10   ssd  0.68799         osd.10       up  1.00000 1.00000
-3        6.71286     host server3
17   hdd  0.09999         osd.17       up  1.00000 1.00000
20   hdd  4.59999         osd.20     down        0 1.00000
 1   ssd  0.22198         osd.1        up  1.00000 1.00000
 7   ssd  0.22198         osd.7        up  1.00000 1.00000
12   ssd  0.22198         osd.12       up  1.00000 1.00000
15   ssd  0.22699         osd.15       up  1.00000 1.00000
23   ssd  0.22198         osd.23       up  1.00000 1.00000
27   ssd  0.22198         osd.27       up  1.00000 1.00000
29   ssd  0.22699         osd.29       up  1.00000 1.00000
33   ssd  0.22198         osd.33       up  1.00000 1.00000
42   ssd  0.22699         osd.42       up  1.00000 1.00000
-5        6.61287     host server4
31   hdd  4.59999         osd.31       up  1.00000 1.00000
 3   ssd  0.22198         osd.3        up  1.00000 1.00000
11   ssd  0.22198         osd.11       up  1.00000 1.00000
16   ssd  0.22699         osd.16       up  1.00000 1.00000
19   ssd  0.22198         osd.19       up  1.00000 1.00000
28   ssd  0.22198         osd.28       up  1.00000 1.00000
37   ssd  0.22198         osd.37       up  1.00000 1.00000
41   ssd  0.22198         osd.41       up  1.00000 1.00000
43   ssd  0.22699         osd.43       up  1.00000 1.00000
46   ssd  0.22699         osd.46       up  1.00000 1.00000
-4        7.28383     host server5
 8   hdd  4.59999         osd.8        up  1.00000 1.00000
 2   ssd  0.22198         osd.2        up  1.00000 1.00000
 5   ssd  0.22198         osd.5        up  1.00000 1.00000
 9   ssd  0.22198         osd.9        up  1.00000 1.00000
14   ssd  0.22699         osd.14       up  1.00000 1.00000
18   ssd  0.22198         osd.18       up  1.00000 1.00000
22   ssd  0.22198         osd.22       up  1.00000 1.00000
26   ssd  0.22198         osd.26       up  1.00000 1.00000
30   ssd  0.22699         osd.30       up  1.00000 1.00000
36   ssd  0.22198         osd.36       up  1.00000 1.00000
40   ssd  0.22198         osd.40       up  1.00000 1.00000
45   ssd  0.22699         osd.45       up  1.00000 1.00000
47   ssd  0.22699         osd.47       up  1.00000 1.00000
[12:54:13] server4:~#

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com