Re: Monitors fallen apart

Székelyi Szabolcs <szekelyi@xxxxxxx> · Fri, 5 Aug 2011 15:34:58 +0200

On 2011. August 4. 20:14:54 Yehuda Sadeh Weinraub wrote:
> 2011/8/3 Székelyi Szabolcs <szekelyi@xxxxxxx>:
> > I'm running ceph 0.32, and since a while it looks like if a monitor
> > fails, then the cluster doesn't find a new one.
> > 
> > I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon,
> > which is also running the client. If I stop one of the cmds+cosd+cmon
> > nodes, ceph -w run on the cmds+cmon node tells nothing but
> > 
> > 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >>
> > <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first
> > fault
> > 
> > infinitely and the filesystem stops working (processes using files in it
> > block forever). Looks like it rties to connect to the killed monitor
> > instead of failing over to a working one.
> > 
> > The first message after killing the node was:
> > 
> > 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon
> > 
> > Do you have any idea what I'm doing wrong?
> 
> Do you have the mon logs, or any core files?

No, and I nuked my ceph cluster since then, because I realized that the monmap 
was kinda screwed up. When I ran the client with the -m option, it reported 
errors that it's unable to connect to some strange hostnames with binary 
characters in it. I guess it got broken in the tortures I've put my cluster 
under.

I'll report if I experience anything like this with the fresh cluster.

Thanks,
-- 
cc

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html