Re: Cluster sync doesn't finsh

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sam,

here the crushmap

http://85.214.49.87/ceph/crushmap.txt
http://85.214.49.87/ceph/crushmap

-martin

Samuel Just schrieb:
It looks like a crushmap related problem.  Could you send us the crushmap?

ceph osd getcrushmap

Thanks
-Sam

On Fri, Nov 18, 2011 at 10:13 AM, Gregory Farnum
<gregory.farnum@xxxxxxxxxxxxx> wrote:
On Fri, Nov 18, 2011 at 10:05 AM, Tommi Virtanen
<tommi.virtanen@xxxxxxxxxxxxx> wrote:
On Thu, Nov 17, 2011 at 12:48, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
Hi,
I am doing cluster failure test, where I shut down one OSD an wait for the
cluster to sync. But the sync never finshed, at around 4-5% it stops. I
stoped osd2.
...
2011-11-17 16:42:45.520740    pg v1337: 600 pgs: 547 active+clean, 53
active+clean+degraded; 113 GB data, 184 GB used, 1141 GB / 1395 GB avail;
4025/82404 degraded (4.884%)
...
The osd log, the ceph.conf, pg dump, osd dump could be found here.

http://85.214.49.87/ceph/
This looks a bit worrying:

2011-11-17 17:56:35.771574 7f704c834700 -- 192.168.42.113:0/2424 >>
192.168.42.114:6802/21115 pipe(0x2596c80 sd=17 pgs=0 cs=0 l=0).connect
claims to be 192.168.42.114:6802/21507 not 192.168.42.114:6802/21115 -
wrong node!

So osd.0 is basically refusing to talk to one of the other OSDs. I
don't understand the messenger well enough to know why this would be,
but it wouldn't surprise me if this problem kept the objects degraded
-- it looks like a breakage in the osd<->osd communication.

Now if this was the reason, I'd expect a restart of all the OSDs to
get it back in shape; messenger state is ephemeral. Can you confirm
that?
Probably not — that wrong node thing can occur for a lot of different
reasons, some of which matter and most of which don't. Sam's looking
into the problem; there's something going wrong with the CRUSH
calculations or the monitor PG placement overrides or something...
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux