Hi Peter, your suggestion pointed me to the right spot. I didn't know about the feature, that ceph will read from replica PGs. So on. I found two functions in the osd/PrimaryLogPG.cc: "check_laggy" and "check_laggy_requeue". On both is first a check, if the partners have the octopus features. if not, the function is skipped. This explains the beginning of the problem after about the half cluster was updated. To verifiy this, I added "return true" in the first line of the functions. The issue is gone with it. But I don't know what problems this could trigger. I know, the root cause is not fixed with it. I think I will open a bug ticket with this knowlage. osd_op_queue_cutoff is set to high and a icmp rate limiting should not happen Thanks Manuel On Thu, 10 Jun 2021 11:28:48 +0200 Peter Lieven <pl@xxxxxxx> wrote: > Am 10.06.21 um 11:08 schrieb Manuel Lausch: > > Hi, > > > > has no one a idea what could cause this issue. Or how I could debug > > it? > > > > In some days I have to go live with this cluster. If I don't have a > > solution I have to go live with nautilus. > > > Hi Manuel, > > > I had similar issues with Octopus and i am thus stuck with Nautilus. > > Can you debug the slow ops and see if the slow ops are caused by the > status "waiting for readable". > > I suspected that it has something to do with the new feature in > Octopus to read from all OSDs regardless if > > they are master for a PG or not. > > > Can you also verify that osd_op_queue_cut_off is set to high and that > icmp rate limiting is disabled on your hosts? > > > Peter > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx