2011/8/3 Székelyi Szabolcs <szekelyi@xxxxxxx>: > Hello, > > I'm running ceph 0.32, and since a while it looks like if a monitor fails, > then the cluster doesn't find a new one. > > I have three nodes, two with cmds+cosd+cmon, and one with cmds+cmon, which is > also running the client. If I stop one of the cmds+cosd+cmon nodes, ceph -w > run on the cmds+cmon node tells nothing but > > 2011-08-03 11:10:47.291875 7f4f043d5700 -- <client_ip>:0/14633 >> > <killed_node_ip>:6789/0 pipe(0x1a7f9c0 sd=4 pgs=0 cs=0 l=0).fault first fault > > infinitely and the filesystem stops working (processes using files in it block > forever). Looks like it rties to connect to the killed monitor instead of > failing over to a working one. > > The first message after killing the node was: > > 2011-08-03 10:57:40.687871 7f4f01563700 monclient: hunting for new mon > > Do you have any idea what I'm doing wrong? > Do you have the mon logs, or any core files? Thanks, Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html