Re: [WRN] map e### wrongly marked me down or wrong addr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 02/27/2012 06:03 PM, Sage Weil wrote:
On Mon, 27 Feb 2012, Székelyi Szabolcs wrote:
Hello,

whenever I restart osd.0 I see a pair of messages like

2012-02-27 17:26:00.132666 mon.0<osd_1_ip>:6789/0 106 : [INF] osd.0
<osd_0_ip>:6801/29931 failed (by osd.1<osd_1_ip>:6806/20125)
2012-02-27 17:26:21.074926 osd.0<osd_0_ip>:6801/29931 1 : [WRN] map e370
wrongly marked me down or wrong addr

a couple of times. The situation stabilizes in a normal state after about two
minutes.

Should I worry about this? Maybe the first message is about the just killed
OSD, and the second comes from the new incarnation, and this is completely
normal? This is Ceph 0.41.

It's not normal.  Wido was seeing something similar, I think.  I suspect
the problem is that during startup ceph-osd just busy, but the heartbeat
code is such that it's not supposed to miss them.

I haven't seen the wrongly marked me down messages, I'm just seeing that 'pairs' of OSD's are marking the other down.

Still trying to figure that one out.


Can you reproduce this with 'debug ms = 1'?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux