Re: Slow recovery on Quincy

胡玮文 <huww98@xxxxxxxxxxx> · Tue, 16 May 2023 18:07:28 +0000

Hi Sake,

We are experiencing the same. I set “osd_mclock_cost_per_byte_usec_hdd” to 0.1 (default is 2.6) and get about 15 times backfill speed, without significant affect client IO. This parameter seems calculated wrongly, from the description 5e-3 should be a reasonable value for HDD (corresponding to 200MB/s). I noticed this default is originally 5.2, then changed to 2.6 to increase the recovery speed. So I suspect the original author just convert the unit wrongly, he may want 5.2e-3 but wrote 5.2 in code.

But all this may be not important in the next version. I see the relevant code is rewritten, and this parameter is now removed.

high_recovery_ops profile works very poorly for us. It increase the average latency of client IO from 50ms to about 1s.

Weiwen Hu

在 2023年5月16日，19:16，Sake Paulusma <sake1989@xxxxxxxxxxx> 写道：

We noticed extremely slow performance when remapping is necessary. We didn't do anything special other than assigning the correct device_class (to ssd). When checking ceph status, we notice the number of objects recovering is around 17-25 (with watch -n 1 -c ceph status).

How can we increase the recovery process?

There isn't any client load, because we're going to migrate to this cluster in the future, so only a rsync once a while is being executed.

[ceph: root@pwsoel12998 /]# ceph status
 cluster:
   id:     da3ca2e4-ee5b-11ed-8096-0050569e8c3b
   health: HEALTH_WARN
           noscrub,nodeep-scrub flag(s) set

 services:
   mon: 5 daemons, quorum pqsoel12997,pqsoel12996,pwsoel12994,pwsoel12998,prghygpl03 (age 3h)
   mgr: pwsoel12998.ylvjcb(active, since 3h), standbys: pqsoel12997.gagpbt
   mds: 4/4 daemons up, 2 standby
   osd: 32 osds: 32 up (since 73m), 32 in (since 6d); 10 remapped pgs
        flags noscrub,nodeep-scrub

 data:
   volumes: 2/2 healthy
   pools:   5 pools, 193 pgs
   objects: 13.97M objects, 853 GiB
   usage:   3.5 TiB used, 12 TiB / 16 TiB avail
   pgs:     755092/55882956 objects misplaced (1.351%)
            183 active+clean
            10  active+remapped+backfilling

 io:
   recovery: 2.3 MiB/s, 20 objects/s

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx