Any idea why misplace recovery wont finish?

Ml Ml <mliebherr99@xxxxxxxxxxxxxx> · Tue, 14 Jan 2025 16:03:12 +0100

Hello List,

i have this 3 node Setup with 17 hdds (new - spinning rust).

After putting it all together and after resetting the manual
Rack-Placements back to default it seems to recover forever.
I also changed the pg placements at some time.

The disks where very busy until this morning when i disabled scrubbing
to see if this would speed up the recovery.

Here is my current status:
----------------------------------------
root@ceph12:~# ceph -s
  cluster:
    id:     2ed9f8fb-1316-4ef1-996d-a3223a3dd594
    health: HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set

  services:
    mon: 3 daemons, quorum ceph10,ceph11,ceph12 (age 2h)
    mgr: ceph11(active, since 3h), standbys: ceph10, ceph12
    osd: 17 osds: 17 up (since 3h), 17 in (since 11d); 18 remapped pgs
         flags noscrub,nodeep-scrub

  data:
    pools:   2 pools, 257 pgs
    objects: 9.62M objects, 37 TiB
    usage:   110 TiB used, 134 TiB / 244 TiB avail
    pgs:     1534565/28857846 objects misplaced (5.318%)
             239 active+clean
             16  active+remapped+backfill_wait
             2   active+remapped+backfilling

  io:
    recovery: 25 MiB/s, 6 objects/s

I set:
ceph tell 'osd.*' injectargs --osd-max-backfills=6 --osd-recovery-max-active=9

My hourly recovery process seems not to really proceed (it decreeses
and increases again):
5.410
5.036
5.269
5.008
5.373
5.769
5.555
5.103
5.067
5.135
5.409
5.417
5.373
5.197
5.090
5.458
5.204
5.339
5.164
5.425
5.692
5.383
5.726
5.492
6.694
6.576
6.362
6.243
6.011
5.880
5.589
5.433
5.846
5.378
5.184
5.647
5.374
5.513

root@ceph12:~# ceph osd perf (this is okay for spinning rust i guess)
osd  commit_latency(ms)  apply_latency(ms)
 17                   0                  0
 13                   0                  0
 11                  57                 57
  9                   4                  4
  0                   0                  0
  1                  79                 79
 14                  43                 43
  2                   0                  0
  3                   0                  0
 16                  43                 43
  4                  33                 33
  5                   0                  0
  6                   0                  0
  7                   0                  0
 10                   0                  0
 12                   0                  0
  8                  48                 48

iostat -xt 3 shows that they are busy but its not overloaded as i can tell.

Any idea why it will only backfill 2?
How could i speed this up?

My other cluster with 7 nodes and very old hdds and 45 OSDs can make
easily 76 objects/s

Cheers,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx