On Mon, 27 Feb 2012, Székelyi Szabolcs wrote: > Hello, > > whenever I restart osd.0 I see a pair of messages like > > 2012-02-27 17:26:00.132666 mon.0 <osd_1_ip>:6789/0 106 : [INF] osd.0 > <osd_0_ip>:6801/29931 failed (by osd.1 <osd_1_ip>:6806/20125) > 2012-02-27 17:26:21.074926 osd.0 <osd_0_ip>:6801/29931 1 : [WRN] map e370 > wrongly marked me down or wrong addr > > a couple of times. The situation stabilizes in a normal state after about two > minutes. > > Should I worry about this? Maybe the first message is about the just killed > OSD, and the second comes from the new incarnation, and this is completely > normal? This is Ceph 0.41. It's not normal. Wido was seeing something similar, I think. I suspect the problem is that during startup ceph-osd just busy, but the heartbeat code is such that it's not supposed to miss them. Can you reproduce this with 'debug ms = 1'? sage