Am 11.06.21 um 11:48 schrieb Dan van der Ster: > On Fri, Jun 11, 2021 at 11:08 AM Peter Lieven <pl@xxxxxxx> wrote: >> Am 10.06.21 um 17:45 schrieb Manuel Lausch: >>> Hi Peter, >>> >>> your suggestion pointed me to the right spot. >>> I didn't know about the feature, that ceph will read from replica >>> PGs. >>> >>> So on. I found two functions in the osd/PrimaryLogPG.cc: >>> "check_laggy" and "check_laggy_requeue". On both is first a check, if >>> the partners have the octopus features. if not, the function is >>> skipped. This explains the beginning of the problem after about the >>> half cluster was updated. >>> >>> To verifiy this, I added "return true" in the first line of the >>> functions. The issue is gone with it. But >>> I don't know what problems this could trigger. I know, the root cause >>> is not fixed with it. >>> I think I will open a bug ticket with this knowlage. >> >> I wonder if I faced the same issue. The issue I had occured when OSDs came back up and peering started. >> >> My cluster was a fresh octopus install so I think the min osd release was set to octopus. >> >> >> Is it in general safe to stay with this switch at nautilus and run octopus to run a maintained release? > I would guess not -- one should follow the upgrade instructions as > documented. (I was merely confirming that Manuel had indeed followed > that procedure). > > IMHO this issue, as described by Manuel, is not understood. > > Manuel could you create a tracker and upload any relevant logs from > when the slow requests began? I would be interested if the time is also spent in "waiting for readable" state. To check this I reduced the slow ops timeout to 1 second and then fetched the info about the slow ops from the osd daemons that were affected. Manuel, can you check this as well? Thanks, Peter _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx