"outed" 10+ OSDs, recovery was fast (300+Mbps) until it wasn't (<1Mbps)

David Young <davidy@xxxxxxxxxxxxxxxxxx> · Tue, 31 May 2022 22:16:31 +1200

Hey guys!

I've got a cluster with 90 OSDs spread across 5 hosts, most of which are
hdd based. After some real-world testing, performance was not up to
expectations, and as I started researching, I realized that I _should_ have
used my locally attached NMVEs as bluestore db devices.

So, I decided to "out" all the OSDs on one node, wait for recovery, and
then delete and recreate these OSDs using a separate metadata device. The
recovery process was relatively straightforward (>300Mbps)  until the end,
at which it dropped to <1Mbps. Interestingly, the amount of of misplaced
objects is gradually *growing*..

Here's what "ceph -s" shows me:

---
  cluster:
    id:     4f4d6b12-7036-42d2-9366-8c99e4897391
    health: HEALTH_WARN
            insufficient standby MDS daemons available
            noout flag(s) set
            131 pgs not deep-scrubbed in time
            87 pgs not scrubbed in time
            3 daemons have recently crashed

  services:
    mon: 3 daemons, quorum b,d,e (age 20h)
    mgr: a(active, since 20h)
    mds: 4/4 daemons up, 2 hot standby
    osd: 77 osds: 77 up (since 8h), 56 in (since 5d); 33 remapped pgs
         flags noout
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 2/2 healthy
    pools:   15 pools, 401 pgs
    objects: 43.43M objects, 91 TiB
    usage:   122 TiB used, 536 TiB / 659 TiB avail
    pgs:     942074/154910213 objects misplaced (0.608%)
             359 active+clean
             32  active+clean+remapped
             9   active+clean+scrubbing+deep
             1   active+remapped+backfilling

  io:
    client:   120 MiB/s rd, 17 MiB/s wr, 151 op/s rd, 319 op/s wr
    recovery: 1.7 MiB/s, 0 objects/s

  progress:
    Global Recovery Event (0s)
      [............................]
---

And here's "ceph osd tree" (I outed all the SSD OSDs on some of my
hyperconverged hosts, and all disks on stg05):

---
ID   CLASS  WEIGHT     TYPE NAME        STATUS  REWEIGHT  PRI-AFF
 -1         840.67651  root default
 -3           0.93149      host node01
  0    ssd    0.93149          osd.0        up         0  1.00000
-11           0.93149      host node03
  4    ssd    0.93149          osd.4        up         0  1.00000
 -5           0.93149      host node04
  1    ssd    0.93149          osd.1        up         0  1.00000
 -7           0.93149      host node05
  2    ssd    0.93149          osd.2        up         0  1.00000
 -9           0.93149      host node06
  3    ssd    0.93149          osd.3        up         0  1.00000
-13           0.93149      host node07
  5    ssd    0.93149          osd.5        up         0  1.00000
-15           0.93149      host node08
  6    ssd    0.93149          osd.6        up         0  1.00000
-25         131.90070      host stg01
  7    hdd   10.91409          osd.7        up   1.00000  1.00000
 13    hdd   10.91409          osd.13       up   1.00000  1.00000
 14    hdd   10.91409          osd.14       up   1.00000  1.00000
 19    hdd   10.91409          osd.19       up   1.00000  1.00000
 23    hdd   10.91409          osd.23       up   1.00000  1.00000
 25    hdd   10.91409          osd.25       up   1.00000  1.00000
 30    hdd   10.91409          osd.30       up   1.00000  1.00000
 36    hdd   10.91409          osd.36       up   1.00000  1.00000
 39    hdd   10.91409          osd.39       up   1.00000  1.00000
 43    hdd   10.91409          osd.43       up   1.00000  1.00000
 48    hdd   10.91409          osd.48       up   1.00000  1.00000
 50    hdd   10.91409          osd.50       up   1.00000  1.00000
 34    ssd    0.46579          osd.34       up   1.00000  1.00000
 55    ssd    0.46579          osd.55       up   1.00000  1.00000
-31         175.56384      host stg02
 12    hdd   14.55269          osd.12       up   1.00000  1.00000
 18    hdd   14.55269          osd.18       up   1.00000  1.00000
 24    hdd   14.55269          osd.24       up   1.00000  1.00000
 29    hdd   14.55269          osd.29       up   1.00000  1.00000
 35    hdd   14.55269          osd.35       up   1.00000  1.00000
 41    hdd   14.55269          osd.41       up   1.00000  1.00000
 46    hdd   14.55269          osd.46       up   1.00000  1.00000
 52    hdd   14.55269          osd.52       up   1.00000  1.00000
 60    hdd   14.55269          osd.60       up   1.00000  1.00000
 64    hdd   14.55269          osd.64       up   1.00000  1.00000
 68    hdd   14.55269          osd.68       up   1.00000  1.00000
 72    hdd   14.55269          osd.72       up   1.00000  1.00000
  8    ssd    0.46579          osd.8        up   1.00000  1.00000
 58    ssd    0.46579          osd.58       up   1.00000  1.00000
-37         175.56384      host stg03
 11    hdd   14.55269          osd.11       up   1.00000  1.00000
 17    hdd   14.55269          osd.17       up   1.00000  1.00000
 21    hdd   14.55269          osd.21       up   1.00000  1.00000
 28    hdd   14.55269          osd.28       up   1.00000  1.00000
 32    hdd   14.55269          osd.32       up   1.00000  1.00000
 40    hdd   14.55269          osd.40       up   1.00000  1.00000
 45    hdd   14.55269          osd.45       up   1.00000  1.00000
 51    hdd   14.55269          osd.51       up   1.00000  1.00000
 56    hdd   14.55269          osd.56       up   1.00000  1.00000
 61    hdd   14.55269          osd.61       up   1.00000  1.00000
 65    hdd   14.55269          osd.65       up   1.00000  1.00000
 69    hdd   14.55269          osd.69       up   1.00000  1.00000
 74    ssd    0.46579          osd.74       up   1.00000  1.00000
 76    ssd    0.46579          osd.76       up   1.00000  1.00000
-34         175.56384      host stg04
 10    hdd   14.55269          osd.10       up   1.00000  1.00000
 16    hdd   14.55269          osd.16       up   1.00000  1.00000
 22    hdd   14.55269          osd.22       up   1.00000  1.00000
 27    hdd   14.55269          osd.27       up   1.00000  1.00000
 37    hdd   14.55269          osd.37       up   1.00000  1.00000
 42    hdd   14.55269          osd.42       up   1.00000  1.00000
 47    hdd   14.55269          osd.47       up   1.00000  1.00000
 54    hdd   14.55269          osd.54       up   1.00000  1.00000
 59    hdd   14.55269          osd.59       up   1.00000  1.00000
 63    hdd   14.55269          osd.63       up   1.00000  1.00000
 67    hdd   14.55269          osd.67       up   1.00000  1.00000
 71    hdd   14.55269          osd.71       up   1.00000  1.00000
 33    ssd    0.46579          osd.33       up   1.00000  1.00000
 75    ssd    0.46579          osd.75       up   1.00000  1.00000
-28         175.56384      host stg05
  9    hdd   14.55269          osd.9        up         0  1.00000
 15    hdd   14.55269          osd.15       up         0  1.00000
 20    hdd   14.55269          osd.20       up         0  1.00000
 26    hdd   14.55269          osd.26       up         0  1.00000
 31    hdd   14.55269          osd.31       up         0  1.00000
 38    hdd   14.55269          osd.38       up         0  1.00000
 44    hdd   14.55269          osd.44       up         0  1.00000
 53    hdd   14.55269          osd.53       up         0  1.00000
 57    hdd   14.55269          osd.57       up         0  1.00000
 62    hdd   14.55269          osd.62       up         0  1.00000
 66    hdd   14.55269          osd.66       up         0  1.00000
 73    hdd   14.55269          osd.73       up         0  1.00000
 49    ssd    0.46579          osd.49       up         0  1.00000
 70    ssd    0.46579          osd.70       up         0  1.00000
---

How can I speed up / fix the recovery of this final PG?

Thanks! :)
D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx