On 2/11/21 1:39 PM, Davor Cubranic wrote: > But the config reference says “high” is already the default value? > (https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/) > It is not default in Nautilus. See https://docs.ceph.com/en/nautilus/rados/configuration/osd-config-ref/?#operations osd op queue cut off Description This selects which priority ops will be sent to the strict queue verses the normal queue. The low setting sends all replication ops and higher to the strict queue, while the high option sends only replication acknowledgement ops and higher to the strict queue. Setting this to high should help when a few OSDs in the cluster are very busy especially when combined with wpq in the osd op queue setting. OSDs that are very busy handling replication traffic could starve primary client traffic on these OSDs without these settings. Requires a restart. Type String Valid Choices low, high Default low > >> On Feb 9, 2021, at 4:42 AM, Milan Kupcevic <milan_kupcevic@xxxxxxxxxxx >> <mailto:milan_kupcevic@xxxxxxxxxxx>> wrote: >> >> On 2/9/21 7:29 AM, Michal Strnad wrote: >>> >>> we are looking for a proper solution of slow_ops. When the disk failed, >>> node is restated ... a lot of slow operations appear. Even if disk (OSD) >>> or node is back again most of slow_ops are still there. On the internet >>> we found only advice that we have to restart monitor. But this is not >>> right approach. Do you have some better solution? How did you treat >>> slow_ops in your production clusters? >>> >>> We are running the latest nautilus on all clusters. >>> >> >> >> >> This config setting should help: >> >> ceph config set osd osd_op_queue_cut_off high >> >> -- Milan Kupcevic Senior Cyberinfrastructure Engineer at Project NESE Harvard University FAS Research Computing _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx