On Mon, Oct 17, 2016 at 08:06:19AM +0000, Somnath Roy wrote: > Thanks Piotr, Wido for quick response. > > @Wido , yes, I thought of trying with those values but I am seeing in the log messages at least 7 osds are reporting failure , so, didn't try. BTW, I found default mon_osd_min_down_reporters is 2 , not 1 and latest master is not having mon_osd_min_down_reports anymore. Not sure what it is replaced with.. > > @Piotr , yes, your PR really helps , thanks ! Regarding each messenger needs to respond to HB is confusing, I know each thread has a HB timeout value and beyond which it will crash with suicide timeout , are you talking about that ? Not really, as I wrote previously - if you keep filling up the pipeline, OSDs will fail to respond for heartbeats because they won't process them at all or will process them, but the output pipeline will be so full that the response won't get to the recipient in time. Suicide timeouts occur when disk threads fail to process ops in reasonable amount of time (hence the name: "suicide"). -- Piotr Dałek branch@xxxxxxxxxxxxxxxx http://blog.predictor.org.pl -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html