Hi,
what is your current mclock profile? The default is "balanced":
quincy-1:~ # ceph config get osd osd_mclock_profile
balanced
You could try setting it to high_recovery_ops [1], or disable it
alltogether [2]:
quincy-1:~ # ceph config set osd osd_op_queue wpq
[1] https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/
[2]
https://docs.clyso.com/blog/2023/03/22/ceph-how-do-disable-mclock-scheduler/
Zitat von Torkil Svensgaard <torkil@xxxxxxxx>:
Hi
Our 17.2.7 cluster:
"
-33 886.00842 datacenter 714
-7 209.93135 host ceph-hdd1
-69 69.86389 host ceph-flash1
-6 188.09579 host ceph-hdd2
-3 233.57649 host ceph-hdd3
-12 184.54091 host ceph-hdd4
-34 824.47168 datacenter DCN
-73 69.86389 host ceph-flash2
-5 252.27127 host ceph-hdd14
-2 201.78067 host ceph-hdd5
-81 288.26501 host ceph-hdd6
-31 264.56207 host ceph-hdd7
-36 1284.48621 datacenter TBA
-77 69.86389 host ceph-flash3
-21 190.83224 host ceph-hdd8
-29 199.08838 host ceph-hdd9
-11 193.85382 host ceph-hdd10
-9 237.28154 host ceph-hdd11
-26 187.19536 host ceph-hdd12
-4 206.37102 host ceph-hdd13
"
We recently created an EC 4+5 pool with failure domain datacenter.
The DCN datacenter only had 2 hdd hosts so we added one more to make
it possible at all, since each DC needs 3 shards, as I understand it.
Backfill was really slow though, so we just added another host to
the DCN datacenter. Backfill looks like this:
"
data:
volumes: 1/1 healthy
pools: 13 pools, 11153 pgs
objects: 311.53M objects, 1000 TiB
usage: 1.6 PiB used, 1.6 PiB / 3.2 PiB avail
pgs: 60/1669775060 objects degraded (0.000%)
373356926/1669775060 objects misplaced (22.360%)
5944 active+clean
5177 active+remapped+backfill_wait
22 active+remapped+backfilling
4 active+recovery_wait+degraded+remapped
3 active+recovery_wait+remapped
2 active+recovery_wait+degraded
1 active+recovering+degraded+remapped
io:
client: 73 MiB/s rd, 339 MiB/s wr, 1.06k op/s rd, 561 op/s wr
recovery: 1.2 GiB/s, 313 objects/s
"
Given that the first host added had 19 OSDs, with none of them
anywhere near the target capacity, and the one we just added has 22
empty OSDs, having just 22 PGs backfilling and 1 recovering seems
somewhat underwhelming.
Is this to be expected with such a pool? Mclock profile is high_recovery_ops.
Mvh.
Torkil
--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: torkil@xxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx