On Fri, Jun 11, 2021 at 11:08 AM Peter Lieven <pl@xxxxxxx> wrote: > > Am 10.06.21 um 17:45 schrieb Manuel Lausch: > > Hi Peter, > > > > your suggestion pointed me to the right spot. > > I didn't know about the feature, that ceph will read from replica > > PGs. > > > > So on. I found two functions in the osd/PrimaryLogPG.cc: > > "check_laggy" and "check_laggy_requeue". On both is first a check, if > > the partners have the octopus features. if not, the function is > > skipped. This explains the beginning of the problem after about the > > half cluster was updated. > > > > To verifiy this, I added "return true" in the first line of the > > functions. The issue is gone with it. But > > I don't know what problems this could trigger. I know, the root cause > > is not fixed with it. > > I think I will open a bug ticket with this knowlage. > > > I wonder if I faced the same issue. The issue I had occured when OSDs came back up and peering started. > > My cluster was a fresh octopus install so I think the min osd release was set to octopus. > > > Is it in general safe to stay with this switch at nautilus and run octopus to run a maintained release? I would guess not -- one should follow the upgrade instructions as documented. (I was merely confirming that Manuel had indeed followed that procedure). IMHO this issue, as described by Manuel, is not understood. Manuel could you create a tracker and upload any relevant logs from when the slow requests began? -- dan > > > > > > > > osd_op_queue_cutoff is set to high > > and a icmp rate limiting should not happen > > > It could if you choose fast shutdown and connections to the OSD daemon are refused with icmp port unreachable?! > > > Peter > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx