Hello, Recently I have an issue with OSD process with dying disk under it - disk suddenly started doing cluster remapping so OSD was stale for a couple of minutes. Unfortunately flapping prevention was not triggered, since writes are simply degraded, not frozen. May be it will be worth to introduce self-marking mechanism working in the seperate thread watching on queue of non-flushed operations and raising a flag on long-time watermark crossing, say, minutes. It`ll be helpful in companion of relatively high down_out interval and in very large setups, where one degraded storage can bring entire data placement to the knees(and flaps are not presented by some reason). Right now I may do such job using orchestrator and watching per-socket statistic, but it is not very reliable at all. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com