Hi Wido, Could it be one of these? mon osd min up ratio mon osd min in ratio 36/120 is 0.3 so it might be one of those magic ratios at play. Cheers, Dan On Thu, 29 Oct 2020, 18:05 Wido den Hollander, <wido@xxxxxxxx> wrote: > Hi, > > I'm investigating an issue where 4 to 5 OSDs in a rack aren't marked as > down when the network is cut to that rack. > > Situation: > > - Nautilus cluster > - 3 racks > - 120 OSDs, 40 per rack > > We performed a test where we turned off the network Top-of-Rack for each > rack. This worked as expected with two racks, but with the third > something weird happened. > > From the 40 OSDs which were supposed to be marked as down only 36 were > marked as down. > > In the end it took 15 minutes for all 40 OSDs to be marked as down. > > $ ceph config set mon mon_osd_reporter_subtree_level rack > > That setting is set to make sure that we only accept reports from other > racks. > > What we saw in the logs for example: > > 2020-10-29T03:49:44.409-0400 7fbda185e700 10 > mon.CEPH2-MON1-206-U39@0(leader).osd e107102 osd.51 has 54 reporters, > 239.856038 grace (20.000000 + 219.856 + 7.43801e-23), max_failed_since > 2020-10-29T03:47:22.374857-0400 > > But osd.51 was still not marked as down after 54 reporters have reported > that it is actually down. > > I checked, no ping or other traffic possible to osd.51. Host is > unreachable. > > Another osd was marked as down, but it took a couple of minutes as well: > > 2020-10-29T03:50:54.455-0400 7fbda185e700 10 > mon.CEPH2-MON1-206-U39@0(leader).osd e107102 osd.37 has 48 reporters, > 221.378970 grace (20.000000 + 201.379 + 6.34437e-23), max_failed_since > 2020-10-29T03:47:12.761584-0400 > 2020-10-29T03:50:54.455-0400 7fbda185e700 1 > mon.CEPH2-MON1-206-U39@0(leader).osd e107102 we have enough reporters > to mark osd.37 down > > In the end osd.51 was marked as down, but only after the MON decided to > do so: > > 2020-10-29T03:53:44.631-0400 7fbda185e700 0 log_channel(cluster) log > [INF] : osd.51 marked down after no beacon for 903.943390 seconds > 2020-10-29T03:53:44.631-0400 7fbda185e700 -1 > mon.CEPH2-MON1-206-U39@0(leader).osd e107104 no beacon from osd.51 since > 2020-10-29T03:38:40.689062-0400, 903.943390 seconds ago. marking down > > I haven't seen this happen before in any cluster. It's also strange that > this only happens in this rack, the other two racks work fine. > > ID CLASS WEIGHT TYPE NAME > -1 1545.35999 root default > > -206 515.12000 rack 206 > > -7 27.94499 host CEPH2-206-U16 > ... > -207 515.12000 rack 207 > > -17 27.94499 host CEPH2-207-U16 > ... > -208 515.12000 rack 208 > > -31 27.94499 host CEPH2-208-U16 > ... > > That's how the CRUSHMap looks like. Straight forward and 3x replication > over 3 racks. > > This issue only occurs in rack *207*. > > Has anybody seen this before or knows where to start? > > Wido > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx