Re: OSDs are flapping and marked down wrongly

Piotr Dałek <branch@xxxxxxxxxxxxxxxx> · Mon, 17 Oct 2016 10:19:38 +0200

On Mon, Oct 17, 2016 at 08:06:19AM +0000, Somnath Roy wrote:
> Thanks Piotr, Wido for quick response.
> 
> @Wido , yes, I thought of trying with those values but I am seeing in the log messages at least 7 osds are reporting failure , so, didn't try. BTW, I found default mon_osd_min_down_reporters is 2 , not 1 and latest master is not having mon_osd_min_down_reports anymore. Not sure what it is replaced with..
> 
> @Piotr , yes, your PR really helps , thanks !  Regarding each messenger needs to respond to HB is confusing, I know each thread has a HB timeout value and beyond which it will crash with suicide timeout , are you talking about that ?

Not really, as I wrote previously - if you keep filling up the pipeline,
OSDs will fail to respond for heartbeats because they won't process them at
all or will process them, but the output pipeline will be so full that the
response won't get to the recipient in time.
Suicide timeouts occur when disk threads fail to process ops in reasonable
amount of time (hence the name: "suicide").

-- 
Piotr Dałek
branch@xxxxxxxxxxxxxxxx
http://blog.predictor.org.pl
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html