Re: Monitors fallen apart

Yehuda Sadeh Weinraub <yehudasa@xxxxxxxxx> · Thu, 4 Aug 2011 11:14:54 -0700

2011/8/3 Székelyi Szabolcs <szekelyi@xxxxxxx>:
> Hello,
>
> I'm running ceph 0.32, and since a while it looks like if a monitor fails,
> then the cluster doesn't find a new one.
>
> I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is
> also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w
> run on the cmds+cmon node tells nothing but
>
> 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
> <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault
>
> infinitely and the filesystem stops working (processes using files in it block
> forever). Looks like it rties to connect to the killed monitor instead of
> failing over to a working one.
>
> The first message after killing the node was:
>
> 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
>
> Do you have any idea what I'm doing wrong?
>

Do you have the mon logs, or any core files?

Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html