[root at tuepdc glusterfs]# grep crash * mnt-glusterfs.log:time of crash: 2013-03-26 16:29:03 mnt-glusterfs.log:time of crash: 2013-03-27 11:27:09 mnt-glusterfs.log:time of crash: 2013-03-27 12:27:52 mnt-glusterfs.log:time of crash: 2013-03-27 18:03:47 mnt-glusterfs.log:time of crash: 2013-03-28 08:57:07 mnt-glusterfs.log:time of crash: 2013-03-28 09:30:22 mnt-glusterfs.log:time of crash: 2013-03-28 10:47:06 ===================================================================== ambavol-replicate-0: size differs for /windows/winuser/schoell/Kopie von Arbeits zeitnachweis_Schoell-Mai2013.xls [2013-03-26 16:28:57.650746] I [afr-common.c:735:afr_lookup_done] 0-sambavol-rep licate-0: background meta-data data self-heal triggered. path: /windows/winuser /schoell/Kopie von Arbeitszeitnachweis_Schoell-Mai2013.xls [2013-03-26 16:29:03.808123] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-sa mbavol-client-1: remote operation failed: Stale NFS file handle [2013-03-26 16:29:03.890754] I [afr-common.c:581:afr_lookup_collect_xattr] 0-sam bavol-replicate-0: data self-heal is pending for /windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls. [2013-03-26 16:29:03.890807] I [afr-common.c:735:afr_lookup_done] 0-sambavol-rep licate-0: background meta-data data self-heal triggered. path: /windows/winuser /schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls [2013-03-26 16:29:03.891570] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-s ambavol-client-1: remote operation failed: No such file or directory [2013-03-26 16:29:03.892425] I [client3_1-fops.c:366:client3_1_open_cbk] 0-samba vol-client-1: remote operation failed: No such file or directory [2013-03-26 16:29:03.892445] I [afr-self-heal-data.c:1002:afr_sh_data_open_cbk] 0-sambavol-replicate-0: open of /windows/winuser/schoell/Kopie von Arbeitszeitna chweis_Schoell-M<E4>rz2013.xls failed on child sambavol-client-1 (No such file o r directory) [2013-03-26 16:29:03.892550] I [client3_1-fops.c:1226:client3_1_inodelk_cbk] 0-s ambavol-client-0: remote operation failed: Invalid argument [2013-03-26 16:29:03.892567] I [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-sa mbavol-replicate-0: /windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Schoe ll-M<E4>rz2013.xls: unlock failed Invalid argument [2013-03-26 16:29:03.893072] I [afr-common.c:581:afr_lookup_collect_xattr] 0-sam bavol-replicate-0: data self-heal is pending for /windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls. [2013-03-26 16:29:03.894570] I [afr-common.c:581:afr_lookup_collect_xattr] 0-sam bavol-replicate-0: data self-heal is pending for /windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls. [2013-03-26 16:29:03.894594] W [afr-common.c:634:afr_lookup_self_heal_check] 0-s ambavol-replicate-0: /windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Scho ell-M<E4>rz2013.xls: gfid different on subvolume [2013-03-26 16:29:03.894610] I [afr-common.c:735:afr_lookup_done] 0-sambavol-rep licate-0: background meta-data data self-heal triggered. path: /windows/winuser /schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls [2013-03-26 16:29:03.895996] I [afr-self-heal-common.c:537:afr_sh_mark_sources] 0-sambavol-replicate-0: split-brain possible, no source detected [2013-03-26 16:29:03.896014] E [afr-self-heal-metadata.c:521:afr_sh_metadata_fix] 0-sambavol-replicate-0: Unable to self-heal permissions/ownership of '/windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls' (possible split-brain). Please fix the file on all backend volumes [2013-03-26 16:29:03.896954] I [afr-self-heal-metadata.c:81:afr_sh_metadata_done] 0-sambavol-replicate-0: aborting selfheal of /windows/winuser/schoell/Kopie von Arbeitszeitnachweis_Schoell-M<E4>rz2013.xls [2013-03-26 16:29:03.970126] I [client3_1-fops.c:366:client3_1_open_cbk] 0-sambavol-client-1: remote operation failed: No such file or directory pending frames: frame : type(1) op(OPEN) frame : type(1) op(OPEN) patchset: v3.2.0 signal received: 11 time of crash: 2013-03-26 16:29:03 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 ============================================================================ ========================================== /winuser/steimle/Buha/Stundennachweis/Stundennachweis 2013/Stundennachweis.xls' (possible split-brain). Please fix the file on all backend volumes [2013-03-27 11:27:09.431579] I [afr-self-heal-metadata.c:81:afr_sh_metadata_done ] 0-sambavol-replicate-0: aborting selfheal of /windows/winuser/steimle/Buha/Stu ndennachweis/Stundennachweis 2013/Stundennachweis.xls [2013-03-27 11:27:09.432480] I [client3_1-fops.c:366:client3_1_open_cbk] 0-samba vol-client-1: remote operation failed: No such file or directory pending frames: frame : type(1) op(OPEN) frame : type(1) op(OPEN) patchset: v3.2.0 signal received: 11 time of crash: 2013-03-27 11:27:09 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.2.0 /lib64/libc.so.6[0x30c0a302d0] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/io-cache.so(io c_open_cbk+0x9b)[0x2aaaaba4d7fb] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/read-ahead.so( ra_open_cbk+0x205)[0x2aaaab842935] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/performance/write-behind.s o(wb_open_cbk+0xf4)[0x2aaaab632784] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/cluster/replicate.so(afr_o pen_cbk+0x232)[0x2aaaab3f8a32] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/xlator/protocol/client.so(client3 _1_open_cbk+0x19f)[0x2aaaab1bfdaf] /opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x2b7ef e01b3d2] /opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x2b7efe01b5c d] /opt/glusterfs/3.2.0/lib64/libgfrpc.so.0(rpc_transport_notify+0x27)[0x2b7efe 0162e7] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/rpc-transport/socket.so(socket_ev ent_poll_in+0x3f)[0x2aaaaad705af] /opt/glusterfs/3.2.0/lib64/glusterfs/3.2.0/rpc-transport/socket.so(socket_ev ent_handler+0x188)[0x2aaaaad70758] /opt/glusterfs/3.2.0/lib64/libglusterfs.so.0[0x2b7efdddb811] /opt/glusterfs/3.2.0/sbin/glusterfs(main+0x407)[0x405577] /lib64/libc.so.6(__libc_start_main+0xf4)[0x30c0a1d994] /opt/glusterfs/3.2.0/sbin/glusterfs[0x4036f9] --------- [2013-03-27 12:14:26.544930] W [write-behind.c:3023:init] 0-sambavol-write-behind: disabling write-behind for first 0 bytes [2013-03-27 12:14:26.544980] I [client.c:1987:build_client_config] 0-sambavol-client-1: setting ping-timeout to 5 [2013-03-27 12:14:26.547225] I [client.c:1987:build_client_config] 0-sambavol-client-0: setting ping-timeout to 5 [2013-03-27 12:14:26.549258] I [client.c:1935:notify] 0-sambavol-client-0: parent translators are ready, attempting connect on transport [2013-03-27 12:14:26.553418] I [client.c:1935:notify] 0-sambavol-client-1: parent translators are ready, attempting connect on transport Given volfile: +--------------------------------------------------------------------------- ---+ 1: volume sambavol-client-0 2: type protocol/client 3: option remote-host 192.168.130.199 4: option remote-subvolume /raid5hs/glusterfs/export 5: option transport-type tcp ============================================================================ ==================================== Shall I seek for something more special? ----------------------------------------------- EDV Daniel M?ller Leitung EDV Tropenklinik Paul-Lechler-Krankenhaus Paul-Lechler-Str. 24 72076 T?bingen Tel.: 07071/206-463, Fax: 07071/206-499 eMail: mueller at tropenklinik.de Internet: www.tropenklinik.de ----------------------------------------------- -----Urspr?ngliche Nachricht----- Von: Pranith Kumar K [mailto:pkarampu at redhat.com] Gesendet: Donnerstag, 28. M?rz 2013 12:34 An: mueller at tropenklinik.de Cc: gluster-users at gluster.org; Reinhard Marstaller Betreff: Re: Glusterfs gives up with endpoint not connected On 03/28/2013 03:48 PM, Daniel M?ller wrote: > Dear all, > > Right out of the blue glusterfs is not working fine any more every now > end the it stops working telling me, Endpoint not connected and > writing core files: > > [root at tuepdc /]# file core.15288 > core.15288: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), > SVR4-style, from 'glusterfs' > > My Version: > [root at tuepdc /]# glusterfs --version > glusterfs 3.2.0 built on Apr 22 2011 18:35:40 Repository revision: > v3.2.0 Copyright (c) 2006-2010 Gluster Inc. <http://www.gluster.com> > GlusterFS comes with ABSOLUTELY NO WARRANTY. > You may redistribute copies of GlusterFS under the terms of the GNU > Affero General Public License. > > My /var/log/glusterfs/bricks/raid5hs-glusterfs-export.log > > [2013-03-28 10:47:07.243980] I [server.c:438:server_rpc_notify] > 0-sambavol-server: disconnected connection from 192.168.130.199:1023 > [2013-03-28 10:47:07.244000] I > [server-helpers.c:783:server_connection_destroy] 0-sambavol-server: > destroyed connection of > tuepdc.local-16600-2013/03/28-09:32:28:258428-sambavol-client-0 > > > [root at tuepdc bricks]# gluster volume info > > Volume Name: sambavol > Type: Replicate > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: 192.168.130.199:/raid5hs/glusterfs/export > Brick2: 192.168.130.200:/raid5hs/glusterfs/export > Options Reconfigured: > network.ping-timeout: 5 > performance.quick-read: on > > Gluster is running on ext3 raid5 HS on both hosts [root at tuepdc > bricks]# mdadm --detail /dev/md0 > /dev/md0: > Version : 0.90 > Creation Time : Wed May 11 10:08:30 2011 > Raid Level : raid5 > Array Size : 1953519872 (1863.02 GiB 2000.40 GB) > Used Dev Size : 976759936 (931.51 GiB 1000.20 GB) > Raid Devices : 3 > Total Devices : 4 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Thu Mar 28 11:13:21 2013 > State : clean > Active Devices : 3 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 1 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : c484e093:018a2517:56e38f5e:1a216491 > Events : 0.250 > > Number Major Minor RaidDevice State > 0 8 49 0 active sync /dev/sdd1 > 1 8 65 1 active sync /dev/sde1 > 2 8 97 2 active sync /dev/sdg1 > > 3 8 81 - spare /dev/sdf1 > > [root at tuepdc glusterfs]# tail -f mnt-glusterfs.log > [2013-03-28 10:57:40.882566] I [rpc-clnt.c:1531:rpc_clnt_reconfig] > 0-sambavol-client-0: changing port to 24009 (from 0) > [2013-03-28 10:57:40.883636] I [rpc-clnt.c:1531:rpc_clnt_reconfig] > 0-sambavol-client-1: changing port to 24009 (from 0) > [2013-03-28 10:57:44.806649] I > [client-handshake.c:1080:select_server_supported_programs] > 0-sambavol-client-0: Using Program GlusterFS-3.1.0, Num (1298437), > Version > (310) > [2013-03-28 10:57:44.806857] I > [client-handshake.c:913:client_setvolume_cbk] > 0-sambavol-client-0: Connected to 192.168.130.199:24009, attached to > remote volume '/raid5hs/glusterfs/export'. > [2013-03-28 10:57:44.806876] I [afr-common.c:2514:afr_notify] > 0-sambavol-replicate-0: Subvolume 'sambavol-client-0' came back up; > going online. > [2013-03-28 10:57:44.811557] I [fuse-bridge.c:3316:fuse_graph_setup] 0-fuse: > switched to graph 0 > [2013-03-28 10:57:44.811773] I [fuse-bridge.c:2897:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 > kernel 7.10 > [2013-03-28 10:57:44.812139] I [afr-common.c:836:afr_fresh_lookup_cbk] > 0-sambavol-replicate-0: added root inode > [2013-03-28 10:57:44.812217] I > [client-handshake.c:1080:select_server_supported_programs] > 0-sambavol-client-1: Using Program GlusterFS-3.1.0, Num (1298437), > Version > (310) > [2013-03-28 10:57:44.812767] I > [client-handshake.c:913:client_setvolume_cbk] > 0-sambavol-client-1: Connected to 192.168.130.200:24009, attached to > remote volume '/raid5hs/glusterfs/export'. > > > > > How can I fix this issue!?? > > Daniel > > ----------------------------------------------- > EDV Daniel M?ller > > Leitung EDV > Tropenklinik Paul-Lechler-Krankenhaus > Paul-Lechler-Str. 24 > 72076 T?bingen > > Tel.: 07071/206-463, Fax: 07071/206-499 > eMail: mueller at tropenklinik.de > Internet: www.tropenklinik.de > ----------------------------------------------- > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users Could you paste the traceback that is printed in the log file of the mount for that crash. search for "crash" in the logs. You will see the trace after that. Paste that here. Pranith.