Re: Performance events and auto-down marking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 30 Jun 2013, Andrey Korolyov wrote:
> Recently I have an issue with OSD process with dying disk under it -
> disk suddenly started doing cluster remapping so OSD was stale for a
> couple of minutes. Unfortunately flapping prevention was not
> triggered, since writes are simply degraded, not frozen. May be it
> will be worth to introduce self-marking mechanism working in the
> seperate thread watching on queue of non-flushed operations and
> raising a flag on long-time watermark crossing, say, minutes. It`ll be
> helpful in companion of relatively high down_out interval and in very
> large setups, where one degraded storage can bring entire data
> placement to the knees(and flaps are not presented by some reason).
> Right now I may do such job using orchestrator and watching per-socket
> statistic, but it is not very reliable at all.

There is alaready an internal check that makes the OSD stop heartbeating 
if the internal io thread doesn't make progress for 15 seconds (by 
default, IIRC).  Was the disk making some progress (just very slow) 
preventing this from kicking in?

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux