Hi Robert, thanks for your reply. These are actually settings I found in cases I referred to with "other cases" in my mail. These settings could be a first step. Looking at the documentation, solving the overload problem might require some QoS settings I found below the description of "osd op queue" https://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#operations . I see some possibilities, but I'm not sure how to use these settings to enforce load dependent rate limiting on clients. As far as I can see, IOPs QoS does not take backlog into account, which would be important for distinguishing a burst from a sustained overload. In addition, this requires mClock, which is labelled experimental. If anyone could shed some light on what possibilities currently exist beyond playing with "osd op queue" and "osd op queue cut off" that would be great. Also if there is some experience out there about this problem. For example, would reducing "osd client op priority" have any effect? As far as I can see, this is only for weighting between recovery and client IO, not for priority of IO already in flight versus new client OPS. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Robert LeBlanc <robert@xxxxxxxxxxxxx> Sent: 23 August 2019 17:28 To: Frank Schilder Cc: ceph-users Subject: Re: ceph fs crashes on simple fio test The WPQ scheduler may help your clients back off when things get busy. Put this in your ceph.conf and restart your OSDs. osd op queue = wpq osd op queue cut off = high ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx