Thanks for your reply. What I meant with high load was load as seen by the top command, all the servers have load average over 10. I added one more noode to add more space. This is what I get from ceph status: cluster: id: <redacted> health: HEALTH_WARN 2 failed cephadm daemon(s) 48 nearfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 24 pgs backfill_toofull 4 pool(s) nearfull services: mon: 5 daemons, quorum ceph03,ceph02,ceph05,ceph01,ceph04 (age 4h) mgr: ceph03.xmbwxh(active, since 2d), standbys: ceph01.ecfgwz, ceph10.rcvwmp mds: 1/1 daemons up, 1 standby, 1 hot standby osd: 61 osds: 61 up (since 4h), 61 in (since 4h); 1264 remapped pgs rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 15 pools, 4465 pgs objects: 26.53M objects, 91 TiB usage: 284 TiB used, 75 TiB / 359 TiB avail pgs: 8613187/79613362 objects misplaced (10.819%) 3201 active+clean 1240 active+remapped+backfilling 22 active+remapped+backfill_toofull 2 active+remapped+backfill_wait+backfill_toofull io: client: 624 MiB/s rd, 1.6 KiB/s wr, 263 op/s rd, 17 op/s wr recovery: 164 MiB/s, 45 objects/s The performance balances as I expected giving priority to client traffic. I get a lot of health warnings about osd_slow_ping_time_back, osd_slow_ping_time_front and slow_ops. I noticed that there are 1240 pgs backfilling in parallel. Is that as expected? /Jimmy On Wed, Jul 6, 2022 at 3:28 PM Sridhar Seshasayee <sseshasa@xxxxxxxxxx> wrote: > Hi Jimmy, > > As you rightly pointed out, the OSD recovery priority does not work > because of the > change to mClock. By default, the "high_client_ops" profile is enabled and > this > optimizes client ops when compared to recovery ops. Recovery ops will take > the > longest time to complete with this profile and this is expected. > > When you say "load avg on my servers is high", I am assuming it's the > recovery load. > If you want recovery ops to complete faster, then you can first try > changing the mClock > profile to the "balanced" profile on all OSDs and see if it improves the > situation. The > "high_recovery_ops" profile would be the next option as it will provide > the best recovery > performance. But with both the "balanced" and the "high_recovery_ops" > profiles, > improved recovery performance will be at the expense of client ops which > will > experience slightly higher latencies. > > For more details on the mClock profiles, see mClock Config Reference: > https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/ > > To switch Profiles, see: > > https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#steps-to-enable-mclock-profile > > The recommendation would be to change the profile on all OSDs to get the > best performance for the operation you are interested in. > > -Sridhar > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx