osd suicide timeout

"Deneau, Tom" <tom.deneau@xxxxxxx> · Fri, 10 Jul 2015 21:45:02 +0000

I have an osd log file from an osd that hit a suicide timeout (with the previous 10000 events logged). 
(On this node I have also seen this suicide timeout happen once before and also a sync_entry timeout.

I can see that 6 minutes or so before that osd died, other osds on the same node were logging
messages such as
    heartbeat_check: no reply from osd.8
so it appears that osd8 stopped responding quite some time before it died.

I'm wondering if there is enough information in the osd8 log file to deduce why osd 8 stopped responding?
I don't know enough to figure it out myself.

Is there any expert who would be willing to take a look at the log file?

-- Tom Deneau, AMD

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html