Re: Any idea why misplace recovery wont finish?

Stephan Hohn <stephan@xxxxxxxxxxxx> · Tue, 14 Jan 2025 16:37:19 +0100

Hi Michael,

if you have mClock scheduler active you need first enable mclock overwrite

➜  ~ # ceph config set osd osd_mclock_override_recovery_settings true

and then change osd settings e.g.

➜  ~ # ceph config set osd osd_max_backfills 5

Cheers

Stephan

Am Di., 14. Jan. 2025 um 16:04 Uhr schrieb Ml Ml <mliebherr99@xxxxxxxxxxxxxx
>:

> Hello List,
>
> i have this 3 node Setup with 17 hdds (new - spinning rust).
>
> After putting it all together and after resetting the manual
> Rack-Placements back to default it seems to recover forever.
> I also changed the pg placements at some time.
>
> The disks where very busy until this morning when i disabled scrubbing
> to see if this would speed up the recovery.
>
> Here is my current status:
> ----------------------------------------
> root@ceph12:~# ceph -s
>   cluster:
>     id:     2ed9f8fb-1316-4ef1-996d-a3223a3dd594
>     health: HEALTH_WARN
>             noscrub,nodeep-scrub flag(s) set
>
>   services:
>     mon: 3 daemons, quorum ceph10,ceph11,ceph12 (age 2h)
>     mgr: ceph11(active, since 3h), standbys: ceph10, ceph12
>     osd: 17 osds: 17 up (since 3h), 17 in (since 11d); 18 remapped pgs
>          flags noscrub,nodeep-scrub
>
>   data:
>     pools:   2 pools, 257 pgs
>     objects: 9.62M objects, 37 TiB
>     usage:   110 TiB used, 134 TiB / 244 TiB avail
>     pgs:     1534565/28857846 objects misplaced (5.318%)
>              239 active+clean
>              16  active+remapped+backfill_wait
>              2   active+remapped+backfilling
>
>   io:
>     recovery: 25 MiB/s, 6 objects/s
>
> I set:
> ceph tell 'osd.*' injectargs --osd-max-backfills=6
> --osd-recovery-max-active=9
>
> My hourly recovery process seems not to really proceed (it decreeses
> and increases again):
> 5.410
> 5.036
> 5.269
> 5.008
> 5.373
> 5.769
> 5.555
> 5.103
> 5.067
> 5.135
> 5.409
> 5.417
> 5.373
> 5.197
> 5.090
> 5.458
> 5.204
> 5.339
> 5.164
> 5.425
> 5.692
> 5.383
> 5.726
> 5.492
> 6.694
> 6.576
> 6.362
> 6.243
> 6.011
> 5.880
> 5.589
> 5.433
> 5.846
> 5.378
> 5.184
> 5.647
> 5.374
> 5.513
>
> root@ceph12:~# ceph osd perf (this is okay for spinning rust i guess)
> osd  commit_latency(ms)  apply_latency(ms)
>  17                   0                  0
>  13                   0                  0
>  11                  57                 57
>   9                   4                  4
>   0                   0                  0
>   1                  79                 79
>  14                  43                 43
>   2                   0                  0
>   3                   0                  0
>  16                  43                 43
>   4                  33                 33
>   5                   0                  0
>   6                   0                  0
>   7                   0                  0
>  10                   0                  0
>  12                   0                  0
>   8                  48                 48
>
> iostat -xt 3 shows that they are busy but its not overloaded as i can tell.
>
> Any idea why it will only backfill 2?
> How could i speed this up?
>
> My other cluster with 7 nodes and very old hdds and 45 OSDs can make
> easily 76 objects/s
>
> Cheers,
> Michael
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx