Re: About ceph disk slowops effect to cluster

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Fri, 12 Jan 2024 14:19:45 -0500

> On Jan 12, 2024, at 03:31, Phong Tran Thanh <tranphong079@xxxxxxxxx> wrote:
> 
> Hi Yang and Anthony,
> 
> I found the solution for this problem on a HDD disk 7200rpm
> 
> When the cluster recovers, one or multiple disk failures because slowop
> appears and then affects the cluster, we can change these configurations
> and may reduce IOPS when recovery.
> osd_mclock_profile=custom
> osd_mclock_scheduler_background_recovery_lim=0.2
> osd_mclock_scheduler_background_recovery_res=0.2
> osd_mclock_scheduler_client_wgt

This got cut off.  What value are you using for wgt?

And how are you setting these?

With 17.2.5 I get

[rook@rook-ceph-tools-5ff8d58445-gkl5w /]$ ceph config set osd osd_mclock_scheduler_background_recovery_res 0.2
Error EINVAL: error parsing value: strict_si_cast: unit prefix not recognized

but with 17.2.6 it works.

The wording isn't clear but I suspect this is a function of https://tracker.ceph.com/issues/57533

> 
> 
> Vào Th 4, 10 thg 1, 2024 vào lúc 11:22 David Yang <gmydw1118@xxxxxxxxx>
> đã viết:
> 
>> The 2*10Gbps shared network seems to be full (1.9GB/s).
>> Is it possible to reduce part of the workload and wait for the cluster
>> to return to a healthy state?
>> Tip: Erasure coding needs to collect all data blocks when recovering
>> data, so it takes up a lot of network card bandwidth and processor
>> resources.
>> 
> 
> 
> -- 
> Trân trọng,
> ----------------------------------------------------------------------------
> 
> *Tran Thanh Phong*
> 
> Email: tranphong079@xxxxxxxxx
> Skype: tranphong079
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx