Re: Proper solution of slow_ops

Milan Kupcevic <milan_kupcevic@xxxxxxxxxxx> · Tue, 9 Feb 2021 07:42:34 -0500

On 2/9/21 7:29 AM, Michal Strnad wrote:
> 
> we are looking for a proper solution of slow_ops. When the disk failed,
> node is restated ... a lot of slow operations appear. Even if disk (OSD)
> or node is back again most of slow_ops are still there. On the internet
> we found only advice that we have to restart monitor. But this is not
> right approach. Do you have some better solution? How did you treat
> slow_ops in your production clusters?
> 
> We are running the latest nautilus on all clusters.
> 

This config setting should help:

 ceph config set osd osd_op_queue_cut_off high

-- 
Milan Kupcevic
Senior Cyberinfrastructure Engineer at Project NESE
Harvard University
FAS Research Computing
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx