Quoting Gregory Farnum (gfarnum@xxxxxxxxxx): > That's a feature, but invoking it may indicate the presence of another > issue. The OSD shuts down if > 1) it has been deleted from the cluster, or > 2) it has been incorrectly marked down a bunch of times by the cluster, and > gives up, or > 3) it has been incorrectly marked down by the cluster, and encounters an > error when it rebinds to new network ports > > In your case, with the port flapping, OSDs are presumably getting marked > down by their peers (since they can't communicate), and eventually give up > on trying to stay alive. You can prevent/reduce that by setting > the osd_max_markdown_count config to a very large number, if you really > want to. It's definitly the peers marking down the OSDs (mon_osd_reporter_subtree_level = datacenter, mon_osd_min_down_reporters = 2 <- 3 DC setup). You have to do pretty weird stuff to achieve this, so we'll leave osd_max_markdown_count default. Good to know it's a feature (in case such a rare condition might arise). Thanks, Stefan -- | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com