Monitors fallen apart

Székelyi Szabolcs <szekelyi@xxxxxxx> · Wed, 3 Aug 2011 11:16:37 +0200

Hello,

I'm running ceph 0.32, and since a while it looks like if a monitor fails, 
then the cluster doesn't find a new one.

I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is 
also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w 
run on the cmds+cmon node tells nothing but

2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >> 
<killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault

infinitely and the filesystem stops working (processes using files in it block 
forever). Looks like it rties to connect to the killed monitor instead of 
failing over to a working one.

The first message after killing the node was:

2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon

Do you have any idea what I'm doing wrong?

Thanks,
-- 
cc

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html