FYI - Setting min down reports to 10 is somewhat risky. Unless you have a really large cluster, I would advise turning that down to 5 or lower. In a past life, we used to run that number
higher on super dense nodes, but we found that it would result in some instances where legitimately down OSDs did not have enough peers to exceed the min down reporters.
ceph-users mailing list
Date: Wednesday, November 30, 2016 at 9:24 AM


Subject: Re: [ceph-users] osd down detection broken in jewel?
It's right there in your config.
mon osd report timeout = 900
On Wed, Nov 30, 2016 at 6:39 AM, Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:
In a test with ceph jewel we tested how long the cluster needs to detect and mark down OSDs after they are killed (with kill -9). The result -> 900 seconds.
In Hammer this took about 20 - 30 seconds.
In the Logfile from the leader monitor are a lot of messeages like
2016-11-30 11:32:20.966567 7f158f5ab700 0 log_channel(cluster) log [DBG] : osd.7 reported failed by osd.272
A deeper look at this. A lot of OSDs reported this exactly one time. In Hammer The OSDs reported a down OSD a few more times.
Finaly there is the following and the osd is marked down.
2016-11-30 11:36:22.633253 7f158fdac700 0 log_channel(cluster) log [INF] : osd.7 marked down after no pg stats for 900.982893seconds
In my ceph.conf I have the following lines in the global section
mon osd min down reporters = 10
mon osd min down reports = 3
mon osd report timeout = 900
It seems the parameter "mon osd min down reports" is removed in jewel but the documentation is not updated ->
Can someone tell me how ceph jewel detects down OSDs and mark them down in a appropriated time?
The Cluster:
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
24 hosts á 60 OSDs -> 1440 OSDs
2 pool with replication factor 4
65536 PGs
5 Mons
