Hello, I'm running ceph 0.32, and since a while it looks like if a monitor fails, then the cluster doesn't find a new one. I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w run on the cmds+cmon node tells nothing but 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >> <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault infinitely and the filesystem stops working (processes using files in it block forever). Looks like it rties to connect to the killed monitor instead of failing over to a working one. The first message after killing the node was: 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon Do you have any idea what I'm doing wrong? Thanks, -- cc -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html