It looks like a crushmap related problem. Could you send us the crushmap? ceph osd getcrushmap Thanks -Sam On Fri, Nov 18, 2011 at 10:13 AM, Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> wrote: > > On Fri, Nov 18, 2011 at 10:05 AM, Tommi Virtanen > <tommi.virtanen@xxxxxxxxxxxxx> wrote: > > On Thu, Nov 17, 2011 at 12:48, Martin Mailand <martin@xxxxxxxxxxxx> wrote: > >> Hi, > >> I am doing cluster failure test, where I shut down one OSD an wait for the > >> cluster to sync. But the sync never finshed, at around 4-5% it stops. I > >> stoped osd2. > > ... > >> 2011-11-17 16:42:45.520740 pg v1337: 600 pgs: 547 active+clean, 53 > >> active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 GB avail; > >> 4025/82404 degraded (4.884%) > > ... > >> The osd log, the ceph.conf, pg dump, osd dump could be found here. > >> > >> http://85.214.49.87/ceph/ > > > > This looks a bit worrying: > > > > 2011-11-17 17:56:35.771574 7f704c834700 -- 192.168.42.113:0/2424 >> > > 192.168.42.114:6802/21115 pipe(0x2596c80 sd=17 pgs=0 cs=0 l=0).connect > > claims to be 192.168.42.114:6802/21507 not 192.168.42.114:6802/21115 - > > wrong node! > > > > So osd.0 is basically refusing to talk to one of the other OSDs. I > > don't understand the messenger well enough to know why this would be, > > but it wouldn't surprise me if this problem kept the objects degraded > > -- it looks like a breakage in the osd<->osd communication. > > > > Now if this was the reason, I'd expect a restart of all the OSDs to > > get it back in shape; messenger state is ephemeral. Can you confirm > > that? > > Probably not — that wrong node thing can occur for a lot of different > reasons, some of which matter and most of which don't. Sam's looking > into the problem; there's something going wrong with the CRUSH > calculations or the monitor PG placement overrides or something... > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html