Hi Jared, All OSD nodes produced logs like We used Ubuntu 14.04 (one node is on 16.04) and ceph version 10.2.10. Thanks Tristan
On Fri, Jul 28, 2017 at 6:06 AM, Jared Watts <Jared.Watts at quantum.com> wrote: > I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in). > ceph status and ceph osd tree output can be found at: > > https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12 > > > > In osd.4 log, I see many of these: > > 2017-07-27 19:38:53.468852 7f3855c1c700 -1 osd.4 120 heartbeat_check: no > reply from 10.32.0.3:6807 osd.15 ever on either front or back, first ping > sent 2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850) > > 2017-07-27 19:38:53.468881 7f3855c1c700 -1 osd.4 120 heartbeat_check: no > reply from 10.32.0.3:6811 osd.16 ever on either front or back, first ping > sent 2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850) > > > > From osd.4, those endpoints look reachable: > > / # nc -vz 10.32.0.3 6807 > > 10.32.0.3 (10.32.0.3:6807) open > > / # nc -vz 10.32.0.3 6811 > > 10.32.0.3 (10.32.0.3:6811) open > > > > What else can I look at to determine why most of the OSDs cannot > communicate? http://tracker.ceph.com/issues/16092 indicates this behavior > is a networking or hardware issue, what else can I check there? I can turn > on extra logging as needed. Thanks! Do a packet capture on both machines at the same time and verify the packets are arriving as expected. > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > |
begin:vcard fn:Tristan Le Toullec n:Le Toullec;Tristan org:CNRS;LOPS adr:;;rue Dumont D'Urville;PLOUZANE;;29280;France email;internet:tristan.letoullec@xxxxxxx title:System Admin tel;work:0290915544 version:2.1 end:vcard
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com