Re: _committed_osd_maps shutdown OSD via async signal, bug or feature?

Stefan Kooman <stefan@xxxxxx> · Thu, 5 Oct 2017 20:47:41 +0200

Quoting Gregory Farnum (gfarnum@xxxxxxxxxx):

> That's a feature, but invoking it may indicate the presence of another
> issue. The OSD shuts down if
> 1) it has been deleted from the cluster, or
> 2) it has been incorrectly marked down a bunch of times by the cluster, and
> gives up, or
> 3) it has been incorrectly marked down by the cluster, and encounters an
> error when it rebinds to new network ports
> 
> In your case, with the port flapping, OSDs are presumably getting marked
> down by their peers (since they can't communicate), and eventually give up
> on trying to stay alive. You can prevent/reduce that by setting
> the osd_max_markdown_count config to a very large number, if you really
> want to.

It's definitly the peers marking down the OSDs
(mon_osd_reporter_subtree_level = datacenter, mon_osd_min_down_reporters
= 2 <- 3 DC setup). You have to do pretty weird stuff to achieve this,
so we'll leave osd_max_markdown_count default. Good to know it's a
feature (in case such a rare condition might arise).

Thanks,

Stefan

-- 
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com