How to slow down PG recovery when a failed OSD node come back?

"huxiaoyu@xxxxxxxxxxxx" <huxiaoyu@xxxxxxxxxxxx> · Wed, 25 Aug 2021 21:46:57 +0200

Dear Cepher,

I had an all flash 3 node Ceph cluster, each node of 8 SSDs as OSDs, running Ceph release 12.2.13. I have the following setting
    osd_op_queue = wpq
    osd_op_queue_cut_off = high
and 
    osd_recovery_sleep= 0.5
  osd_min_pg_log_entries = 3000
    osd_max_pg_log_entries = 10000
 osd_max_backfills = 1

The problem i encountered is the following: After a failed OSD node come back and re-join, there is 3-5 mimutes period during which the recovery workload overwhelming the system, making user IO almost stall. After this 3-5 mimutes, the recovery process seems to calm down and slow down to a reasonable level, give priority to user IO workload.

What happens during the crazy 3-5 minutes? and how to reduce the negative impact then?

any suggestions and comments are highly appreciated,

best regards,

Samuel

huxiaoyu@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx