hi amar! I made the test like you want me to do and the glusterfsd dies again. virt-zabbix-02:~ # mv /etc/glusterd /etc/glusterd.old virt-zabbix-03:~ # mv /etc/glusterd /etc/glusterd.old virt-zabbix-02:~ # rcglusterd start Starting glusterd: done virt-zabbix-03:~ # rcglusterd start Starting glusterd: done on both nodes the directory "/etc/glusterd" within alle subdirs and files got installed new both logs look okay (I've now IB devices - only TCP connection): [2011-01-14 10:26:35.949930] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2011-01-14 10:26:35.951803] C [rdma.c:3916:rdma_init] rpc-transport/rdma: No IB devices found [2011-01-14 10:26:35.951828] E [rdma.c:4789:init] rdma.management: Failed to initialize IB Device [2011-01-14 10:26:35.951842] E [rpc-transport.c:971:rpc_transport_load] rpc-transport: 'rdma' initialization failed [2011-01-14 10:26:35.951926] I [glusterd.c:96:glusterd_uuid_init] glusterd: generated UUID: 2e633473-734a-45ff-84c8-3e0b8c6f399f Given volfile: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,rdma,tcp 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +------------------------------------------------------------------------------+ virt-zabbix-02:~ # gluster peer probe virt-zabbix-03 no output was returned and glusterfsd died on virt-zabbix-02 checking the processes: virt-zabbix-02:~ # ps auxw | grep -i [g]luster root 25901 0.0 0.1 8400 724 pts/1 S+ 10:24 0:00 tail -f /var/log/glusterfs/etc-glusterfs-glusterd.vol.log virt-zabbix-03:~ # ps auxw | grep -i [g]luster root 23204 0.0 0.1 8400 724 pts/1 S+ 10:24 0:00 tail -f /var/log/glusterfs/etc-glusterfs-glusterd.vol.log root 23235 0.0 1.8 57428 9956 ? Ssl 10:26 0:00 /usr/sbin/glusterd virt-zabbix-02 log: [2011-01-14 10:26:55.418833] I [glusterd-handler.c:563:glusterd_handle_cli_probe] glusterd: Received CLI probe req virt-zabbix-03 24007 [2011-01-14 10:26:55.419424] I [glusterd-handler.c:398:glusterd_friend_find] glusterd: Unable to find hostname: virt-zabbix-03 [2011-01-14 10:26:55.419446] I [glusterd-handler.c:2618:glusterd_probe_begin] glusterd: Unable to find peerinfo for host: virt-zabbix-03 (24007) [2011-01-14 10:26:55.421766] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing 'option transport-type'. defaulting to "socket" [2011-01-14 10:26:55.422701] I [glusterd-handler.c:2600:glusterd_friend_add] glusterd: connect returned 0 [2011-01-14 10:26:55.427538] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend virt-zabbix-03 found.. state: 0 [2011-01-14 10:26:55.480741] I [glusterd3_1-mops.c:80:glusterd3_1_probe_cbk] glusterd: Received probe resp from uuid: 43bdc56b-c2fb-4464-9dac-2d3f8a37f3e2, host: virt-zabbix-03 [2011-01-14 10:26:55.480817] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable to find peer by uuid [2011-01-14 10:26:55.480833] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] glusterd: Friend virt-zabbix-03 found.. state: 0 pending frames: patchset: v3.1.1 signal received: 11 time of crash: 2011-01-14 10:26:55 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.1 /lib64/libc.so.6(+0x329e0)[0x7f667578b9e0] /usr/lib64/libgfrpc.so.0(rpc_transport_connect+0xc)[0x7f66760f806c] /usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3d8)[0x7f66760fd878] /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_submit_request+0x15e)[0x7f66740533be] /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_friend_add+0x11b)[0x7f6674057f3b] /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(+0x27b17)[0x7f6674040b17] /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x175)[0x7f6674040675] /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_probe_cbk+0x495)[0x7f667405b1f5] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7f66760fca94] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc8)[0x7f66760fccd8] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7f66760f7f2e] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7f6673e11f9f] /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x114)[0x7f6673e120d4] /usr/lib64/libglusterfs.so.0(+0x3a384)[0x7f667633e384] /usr/sbin/glusterd(main+0x23c)[0x4055dc] /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f6675777bc6] /usr/sbin/glusterd[0x4032c9] --------- virt-zabbix-03 log: [2011-01-14 10:26:55.426854] I [glusterd-handler.c:2387:glusterd_handle_probe_query] glusterd: Received probe from uuid: 2e633473-734a-45ff-84c8-3e0b8c6f399f [2011-01-14 10:26:55.426970] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable to find peer by uuid [2011-01-14 10:26:55.471791] I [glusterd-handler.c:398:glusterd_friend_find] glusterd: Unable to find hostname: 192.168.8.104 [2011-01-14 10:26:55.471820] I [glusterd-handler.c:2401:glusterd_handle_probe_query] glusterd: Unable to find peerinfo for host: 192.168.8.104 (24007) [2011-01-14 10:26:55.474557] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing 'option transport-type'. defaulting to "socket" [2011-01-14 10:26:55.479430] I [glusterd-handler.c:2600:glusterd_friend_add] glusterd: connect returned 0 [2011-01-14 10:26:55.479555] I [glusterd-handler.c:2422:glusterd_handle_probe_query] glusterd: Responded to virt-zabbix-03, op_ret: 0, op_errno: 0, ret: 0 [2011-01-14 10:27:01.488486] E [socket.c:1656:socket_connect_finish] management: connection to 192.168.8.104:24007 failed (Connection refused) regards markus Am 14.01.2011 06:56, schrieb Amar Tumballi: > Hi Markus, > > This is the first time I am coming across this particular backtrace/crash. Looking into it now. > Have filed a bug @ http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2293 > > Mean time, can you try below options and see if it fixes issues: > > * stop all gluster processes (glusterd/glusterfs/glusterfsd) > > * mv glusterd config directory > > bash# mv /etc/glusterd /etc/glusterd.old (on both machines) > > * start glusterd on both machines, do gluster peer probe now > > Let me know the output.. > > Regards, > Amar > > > 2011/1/14 Markus Fr?hlich <markus.froehlich at xidras.com <mailto:markus.froehlich at xidras.com>> > > I have two servers with SLES11 SP1 x86_64 and compiled last version of glusterfs 3.1.1. > firewall is disabled on both nodes and they are on the same network. > > I put both hostnames in the hosts file, so that each node can resolv the others hostname correctly > 192.168.8.104 virt-zabbix-02 > 192.168.8.105 virt-zabbix-03 > > this is my config on both nodes: "/etc/glusterfs/glusterd.vol" > volume management > type mgmt/glusterd > option working-directory /etc/glusterd > option transport-type socket,rdma > option transport.socket.keepalive-time 10 > option transport.socket.keepalive-interval 2 > end-volume > > virt-zabbix-02# gluster peer status > No peers present > > log: > [2011-01-13 19:53:31.576554] I [glusterd-handler.c:674:glusterd_handle_cli_list_friends] > glusterd: Received cli list req > > this is okay, but then, when I want to add the other node to the cluster, the "glusterfsd" > dies on "virt-zabbix-02" where I type the command and a core-dump file is generated: > virt-zabbix-02# gluster peer probe virt-zabbix-03 > > log virt-zabbix-02: > [2011-01-13 19:54:29.284735] I [glusterd-handler.c:563:glusterd_handle_cli_probe] glusterd: > Received CLI probe req virt-zabbix-03 24007 > [2011-01-13 19:54:29.285110] I [glusterd-handler.c:398:glusterd_friend_find] glusterd: Unable > to find hostname: virt-zabbix-03 > [2011-01-13 19:54:29.285136] I [glusterd-handler.c:2618:glusterd_probe_begin] glusterd: Unable > to find peerinfo for host: virt-zabbix-03 (24007) > [2011-01-13 19:54:29.287625] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing > 'option transport-type'. defaulting to "socket" > [2011-01-13 19:54:29.288496] I [glusterd-handler.c:2600:glusterd_friend_add] glusterd: connect > returned 0 > [2011-01-13 19:54:29.293369] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] > glusterd: Friend virt-zabbix-03 found.. state: 0 > [2011-01-13 19:54:29.302062] I [glusterd3_1-mops.c:80:glusterd3_1_probe_cbk] glusterd: > Received probe resp from uuid: 255540da-4b86-46f2-963c-3214e2c5e28a, host: virt-zabbix-03 > [2011-01-13 19:54:29.302097] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable > to find peer by uuid > [2011-01-13 19:54:29.302111] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] > glusterd: Friend virt-zabbix-03 found.. state: 0 > pending frames: > > patchset: v3.1.1 > signal received: 11 > time of crash: 2011-01-13 19:54:29 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > fdatasync 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.1.1 > /lib64/libc.so.6(+0x329e0)[0x7f1cbbb589e0] > /usr/lib64/libgfrpc.so.0(rpc_transport_connect+0xc)[0x7f1cbc4c506c] > /usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3d8)[0x7f1cbc4ca878] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_submit_request+0x15e)[0x7f1cba4203be] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_friend_add+0x11b)[0x7f1cba424f3b] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(+0x27b17)[0x7f1cba40db17] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x175)[0x7f1cba40d675] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_probe_cbk+0x495)[0x7f1cba4281f5] > /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7f1cbc4c9a94] > /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc8)[0x7f1cbc4c9cd8] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7f1cbc4c4f2e] > /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7f1cba1def9f] > /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x114)[0x7f1cba1df0d4] > /usr/lib64/libglusterfs.so.0(+0x3a384)[0x7f1cbc70b384] > /usr/sbin/glusterd(main+0x23c)[0x4055dc] > /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f1cbbb44bc6] > /usr/sbin/glusterd[0x4032c9] > --------- > > log virt-zabbix-03: > [2011-01-13 19:54:29.296723] I [glusterd-handler.c:2387:glusterd_handle_probe_query] glusterd: > Received probe from uuid: a9b660c5-456d-4e96-9bdd-d23c917ae941 > [2011-01-13 19:54:29.296802] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable > to find peer by uuid > [2011-01-13 19:54:29.297224] I [glusterd-handler.c:398:glusterd_friend_find] glusterd: Unable > to find hostname: 192.168.8.104 > [2011-01-13 19:54:29.297278] I [glusterd-handler.c:2401:glusterd_handle_probe_query] glusterd: > Unable to find peerinfo for host: 192.168.8.104 (24007) > [2011-01-13 19:54:29.300119] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing > 'option transport-type'. defaulting to "socket" > [2011-01-13 19:54:29.304856] I [glusterd-handler.c:2600:glusterd_friend_add] glusterd: connect > returned 0 > [2011-01-13 19:54:29.304994] I [glusterd-handler.c:2422:glusterd_handle_probe_query] glusterd: > Responded to virt-zabbix-03, op_ret: 0, op_errno: 0, ret: 0 > [2011-01-13 19:54:35.314773] E [socket.c:1656:socket_connect_finish] management: connection to > 192.168.8.104:24007 <http://192.168.8.104:24007> failed (Connection refused) > > > so I start the "gluserfsd" on virt-zabbix-02 again - a few secounds later the glusterfsd dies > on the other node virt-zabbix-03 and there also a core-dump file is generated > > log virt-zabbix-02: > [2011-01-13 19:57:08.911495] I [glusterd-handler.c:2387:glusterd_handle_probe_query] glusterd: > Received probe from uuid: 255540da-4b86-46f2-963c-3214e2c5e28a > [2011-01-13 19:57:08.911559] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable > to find peer by uuid > [2011-01-13 19:57:08.911643] I [glusterd-utils.c:2140:glusterd_friend_find_by_hostname] > glusterd: Friend 192.168.8.105 found.. state: 0 > [2011-01-13 19:57:08.911715] I [glusterd-handler.c:2422:glusterd_handle_probe_query] glusterd: > Responded to 192.168.8.104, op_ret: 0, op_errno: 0, ret: 0 > [2011-01-13 19:57:11.956152] E [socket.c:1656:socket_connect_finish] management: connection to > 192.168.8.105:24007 <http://192.168.8.105:24007> failed (Connection refused) > > > log virt-zabbix-03: > [2011-01-13 19:57:08.913897] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] > glusterd: Friend 192.168.8.104 found.. state: 0 > [2011-01-13 19:57:08.915052] I [glusterd3_1-mops.c:80:glusterd3_1_probe_cbk] glusterd: > Received probe resp from uuid: a9b660c5-456d-4e96-9bdd-d23c917ae941, host: 192.168.8.104 > [2011-01-13 19:57:08.915085] I [glusterd-handler.c:386:glusterd_friend_find] glusterd: Unable > to find peer by uuid > [2011-01-13 19:57:08.915100] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] > glusterd: Friend 192.168.8.104 found.. state: 0 > pending frames: > > patchset: v3.1.1 > signal received: 11 > time of crash: 2011-01-13 19:57:08 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > fdatasync 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.1.1 > /lib64/libc.so.6(+0x329e0)[0x7fe84e6ee9e0] > /usr/lib64/libgfrpc.so.0(rpc_transport_connect+0xc)[0x7fe84f05b06c] > /usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3d8)[0x7fe84f060878] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_submit_request+0x15e)[0x7fe84cfb63be] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_friend_add+0x11b)[0x7fe84cfbaf3b] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(+0x27b17)[0x7fe84cfa3b17] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd_friend_sm+0x175)[0x7fe84cfa3675] > /usr/lib64/glusterfs/3.1.1/xlator/mgmt/glusterd.so(glusterd3_1_probe_cbk+0x495)[0x7fe84cfbe1f5] > /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa4)[0x7fe84f05fa94] > /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc8)[0x7fe84f05fcd8] > /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2e)[0x7fe84f05af2e] > /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x7fe84cd74f9f] > /usr/lib64/glusterfs/3.1.1/rpc-transport/socket.so(socket_event_handler+0x114)[0x7fe84cd750d4] > /usr/lib64/libglusterfs.so.0(+0x3a384)[0x7fe84f2a1384] > /usr/sbin/glusterd(main+0x23c)[0x4055dc] > /lib64/libc.so.6(__libc_start_main+0xe6)[0x7fe84e6dabc6] > /usr/sbin/glusterd[0x4032c9] > --------- > > > starting the glusterfsd on virt-zabbix-03 again, let die the glusterfsd on virt-zabbix-02 and > so on > so I make sure the daemon is stopped on both hosts. > the peer file generated on the nodes are different one is named with the hostname, the other > with the IP: > virt-zabbix-02:# cat /etc/glusterd/peers/virt-zabbix-03 > uuid= > state=0 > hostname1=virt-zabbix-03 > > virt-zabbix-03:# cat /etc/glusterd/peers/192.168.8.104 <http://192.168.8.104> > uuid= > state=0 > hostname1=192.168.8.104 > > > so I see the uuid is empty in both files and I fill it with the uuid from each others > "/etc/glusterd/glusterd.info <http://glusterd.info>" file: > virt-zabbix-02:/ # cat /etc/glusterd/glusterd.info <http://glusterd.info> > UUID=a9b660c5-456d-4e96-9bdd-d23c917ae941 > virt-zabbix-03:/ # cat etc/glusterd/glusterd.info <http://glusterd.info> > UUID=255540da-4b86-46f2-963c-3214e2c5e28a > > virt-zabbix-02:/ # cat /etc/glusterd/peers/virt-zabbix-03 > uuid=255540da-4b86-46f2-963c-3214e2c5e28a > state=0 > hostname1=virt-zabbix-03 > > virt-zabbix-03:/ # cat /etc/glusterd/peers/192.168.8.104 <http://192.168.8.104> > uuid=a9b660c5-456d-4e96-9bdd-d23c917ae941 > state=0 > hostname1=192.168.8.104 > > > now I start "glusterfsd" on both nodes again and both daemons keep running and I can type the > command: > virt-zabbix-02:/ # gluster peer status > Number of Peers: 1 > > Hostname: virt-zabbix-03 > Uuid: 255540da-4b86-46f2-963c-3214e2c5e28a > State: Establishing Connection (Connected) > > I'd like to create my first test volume: > gluster volume create mytest transport tcp virt-zabbix-02:/gfs1 virt-zabbix-03:/gfs1 > Creation of volume mytest has been unsuccessful > Host virt-zabbix-03 not connected > > log virt-zabbix-02: > [2011-01-13 20:11:10.706931] I [glusterd-handler.c:674:glusterd_handle_cli_list_friends] > glusterd: Received cli list req > [2011-01-13 20:12:20.950199] I [glusterd-handler.c:785:glusterd_handle_create_volume] > glusterd: Received create volume req > [2011-01-13 20:12:20.950907] I [glusterd-utils.c:2101:glusterd_friend_find_by_hostname] > glusterd: Friend virt-zabbix-03 found.. state: 0 > [2011-01-13 20:12:20.950935] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: > Friend found.. state: Establishing Connection > [2011-01-13 20:12:20.950950] E [glusterd-utils.c:2324:glusterd_new_brick_validate] glusterd: > Host virt-zabbix-03 not connected > [2011-01-13 20:12:20.951005] E [glusterd-handler.c:906:glusterd_handle_create_volume] > glusterd: Unlock on opinfo failed > > no logfiles on virt-zabbix-03 > > not connected? strange! status info again: > virt-zabbix-02:/ # gluster peer status > Number of Peers: 1 > > Hostname: virt-zabbix-03 > Uuid: 255540da-4b86-46f2-963c-3214e2c5e28a > State: Establishing Connection (Connected) > > log virt-zabbix-02: > [2011-01-13 20:13:24.601901] I [glusterd-handler.c:674:glusterd_handle_cli_list_friends] > glusterd: Received cli list req > > > so I restart the glusterfsd on virt-zabbix-03 and the daemon on virt-zabbix-02 dies again > > has some one any idea whats going wrong? > > kind regards > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > -- Mit freundlichen Gr?ssen Markus Fr?hlich Techniker ________________________________________________________ Xidras GmbH Stockern 47 3744 Stockern Austria Tel: +43 (0) 2983 201 30503 Fax: +43 (0) 2983 201 305039 Email: markus.froehlich at xidras.com Web: http://www.xidras.com FN 317036 f | Landesgericht Krems | ATU64485024 ________________________________________________________________________________ VERTRAULICHE INFORMATIONEN! Diese eMail enth?lt vertrauliche Informationen und ist nur f?r den berechtigten Empf?nger bestimmt. Wenn diese eMail nicht f?r Sie bestimmt ist, bitten wir Sie, diese eMail an uns zur?ckzusenden und anschlie?end auf Ihrem Computer und Mail-Server zu l?schen. Solche eMails und Anlagen d?rfen Sie weder nutzen, noch verarbeiten oder Dritten zug?nglich machen, gleich in welcher Form. Wir danken f?r Ihre Kooperation! CONFIDENTIAL! This email contains confidential information and is intended for the authorised recipient only. If you are not an authorised recipient, please return the email to us and then delete it from your computer and mail-server. You may neither use nor edit any such emails including attachments, nor make them accessible to third parties in any manner whatsoever. Thank you for your cooperation ________________________________________________________________________________