Re: Any idea why misplace recovery wont finish?

peter.linder@xxxxxxxxxxxxxx · Tue, 14 Jan 2025 16:37:31 +0100

The balancer is on, that is what triggers new misplaced object whenever 
the ratio goes near/below 5%.

You may want to disable it, or by all means let it eventually finish.

Den 2025-01-14 kl. 16:03, skrev Ml Ml:
Hello List,

i have this 3 node Setup with 17 hdds (new - spinning rust).

After putting it all together and after resetting the manual
Rack-Placements back to default it seems to recover forever.
I also changed the pg placements at some time.

The disks where very busy until this morning when i disabled scrubbing
to see if this would speed up the recovery.

Here is my current status:
----------------------------------------
root@ceph12:~# ceph -s
   cluster:
     id:     2ed9f8fb-1316-4ef1-996d-a3223a3dd594
     health: HEALTH_WARN
             noscrub,nodeep-scrub flag(s) set

   services:
     mon: 3 daemons, quorum ceph10,ceph11,ceph12 (age 2h)
     mgr: ceph11(active, since 3h), standbys: ceph10, ceph12
     osd: 17 osds: 17 up (since 3h), 17 in (since 11d); 18 remapped pgs
          flags noscrub,nodeep-scrub

   data:
     pools:   2 pools, 257 pgs
     objects: 9.62M objects, 37 TiB
     usage:   110 TiB used, 134 TiB / 244 TiB avail
     pgs:     1534565/28857846 objects misplaced (5.318%)
              239 active+clean
              16  active+remapped+backfill_wait
              2   active+remapped+backfilling

   io:
     recovery: 25 MiB/s, 6 objects/s

I set:
ceph tell 'osd.*' injectargs --osd-max-backfills=6 --osd-recovery-max-active=9

My hourly recovery process seems not to really proceed (it decreeses
and increases again):
5.410
5.036
5.269
5.008
5.373
5.769
5.555
5.103
5.067
5.135
5.409
5.417
5.373
5.197
5.090
5.458
5.204
5.339
5.164
5.425
5.692
5.383
5.726
5.492
6.694
6.576
6.362
6.243
6.011
5.880
5.589
5.433
5.846
5.378
5.184
5.647
5.374
5.513

root@ceph12:~# ceph osd perf (this is okay for spinning rust i guess)
osd  commit_latency(ms)  apply_latency(ms)
  17                   0                  0
  13                   0                  0
  11                  57                 57
   9                   4                  4
   0                   0                  0
   1                  79                 79
  14                  43                 43
   2                   0                  0
   3                   0                  0
  16                  43                 43
   4                  33                 33
   5                   0                  0
   6                   0                  0
   7                   0                  0
  10                   0                  0
  12                   0                  0
   8                  48                 48

iostat -xt 3 shows that they are busy but its not overloaded as i can tell.

Any idea why it will only backfill 2?
How could i speed this up?

My other cluster with 7 nodes and very old hdds and 45 OSDs can make
easily 76 objects/s

Cheers,
Michael
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx