Okay, I poked around a bit more and found this document: https://docs.ceph.com/en/latest/dev/osd_internals/stale_read/ I don't understand exactly what it is all about and how it works, and what the intetion is behind it. But there is one config option mentiond: "osd_pool_default_read_lease_ratio" This is defaulted to 0.8. Multiplied with the osd_hearbeat_grace (which is default 20) it sets that "read lease" to 16 seconds ?! I set this ratio to 0.2 which leads to 4 seconds lease time. With that, the problem is solved. No more slow ops. Until now, I thought that this is a problem on huge clusters. But with this setting I assumed that this should be a issue with quite small cluster as well. So I tested it with a 3 Node 12 OSD SSD Cluster on octopus with the same issues. I can't believe I am the first one, which have this problem. Manuel On Thu, 10 Jun 2021 17:45:02 +0200 Manuel Lausch <manuel.lausch@xxxxxxxx> wrote: > Hi Peter, > > your suggestion pointed me to the right spot. > I didn't know about the feature, that ceph will read from replica > PGs. > > So on. I found two functions in the osd/PrimaryLogPG.cc: > "check_laggy" and "check_laggy_requeue". On both is first a check, if > the partners have the octopus features. if not, the function is > skipped. This explains the beginning of the problem after about the > half cluster was updated. > > To verifiy this, I added "return true" in the first line of the > functions. The issue is gone with it. But > I don't know what problems this could trigger. I know, the root cause > is not fixed with it. > I think I will open a bug ticket with this knowlage. > > > osd_op_queue_cutoff is set to high > and a icmp rate limiting should not happen > > > Thanks > Manuel _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx