Performance events and auto-down marking

Andrey Korolyov <andrey@xxxxxxx> · Sun, 30 Jun 2013 23:43:39 +0400

Hello,

Recently I have an issue with OSD process with dying disk under it -
disk suddenly started doing cluster remapping so OSD was stale for a
couple of minutes. Unfortunately flapping prevention was not
triggered, since writes are simply degraded, not frozen. May be it
will be worth to introduce self-marking mechanism working in the
seperate thread watching on queue of non-flushed operations and
raising a flag on long-time watermark crossing, say, minutes. It`ll be
helpful in companion of relatively high down_out interval and in very
large setups, where one degraded storage can bring entire data
placement to the knees(and flaps are not presented by some reason).
Right now I may do such job using orchestrator and watching per-socket
statistic, but it is not very reliable at all.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com