Re: _committed_osd_maps shutdown OSD via async signal, bug or feature?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 5, 2017 at 6:48 AM Stefan Kooman <stefan@xxxxxx> wrote:
Hi,

During testing (mimicking BGP / port flaps) on our cluster we are able
to trigger a "_committed_osd_maps shutdown OSD via async signal" on the
the affected OSD servers in that datacenter (OSDs in that DC become
intermittent isolated from their peers). Result is that all OSD
processes stop. Is this a bug or a feature? I.e. is there a "flap"
detection mechanism in Ceph OSD?

If it's a bug it might be related to
http://tracker.ceph.com/issues/20174. We get similiar error message on
"12.2.0". Version "12.2.1" does not log

"-1 Fail to open
'/proc/0/cmdline' error = (2) No such file or directory
-1 received  signal: Interrupt from  PID: 0 task name: <unknown> UID: 0
-1 osd.21 1846 *** Got signal Interrupt ***
0 osd.21 1846 prepare_to_stop starting shutdown
-1 osd.21 1846 shutdown"


That's a feature, but invoking it may indicate the presence of another issue. The OSD shuts down if
1) it has been deleted from the cluster, or
2) it has been incorrectly marked down a bunch of times by the cluster, and gives up, or
3) it has been incorrectly marked down by the cluster, and encounters an error when it rebinds to new network ports

In your case, with the port flapping, OSDs are presumably getting marked down by their peers (since they can't communicate), and eventually give up on trying to stay alive. You can prevent/reduce that by setting the osd_max_markdown_count config to a very large number, if you really want to.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux