Hi, i have im having issues with gluster nfs, it keep crashing after few hours under medium load.
OS: CentOS 7.2
Gluster version 3.7.13
Gluster info;
Volume Name: vlm01
Type: Distributed-Replicate
Volume ID: eacd8248-dca3-4530-9aed-7714a5a114f2
Status: Started
Number of Bricks: 7 x 3 = 21
Transport-type: tcp
Bricks:
Brick1: gfs01:/bricks/b01/vlm01
Brick2: gfs02:/bricks/b01/vlm01
Brick3: gfs03:/bricks/b01/vlm01
Brick4: gfs01:/bricks/b02/vlm01
Brick5: gfs02:/bricks/b02/vlm01
Brick6: gfs03:/bricks/b02/vlm01
Brick7: gfs01:/bricks/b03/vlm01
Brick8: gfs02:/bricks/b03/vlm01
Brick9: gfs03:/bricks/b03/vlm01
Brick10: gfs01:/bricks/b04/vlm01
Brick11: gfs02:/bricks/b04/vlm01
Brick12: gfs03:/bricks/b04/vlm01
Brick13: gfs01:/bricks/b05/vlm01
Brick14: gfs02:/bricks/b05/vlm01
Brick15: gfs03:/bricks/b05/vlm01
Brick16: gfs01:/bricks/b06/vlm01
Brick17: gfs02:/bricks/b06/vlm01
Brick18: gfs03:/bricks/b06/vlm01
Brick19: gfs01:/bricks/b07/vlm01
Brick20: gfs02:/bricks/b07/vlm01
Brick21: gfs03:/bricks/b07/vlm01
Options Reconfigured:
auth.allow: 192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56
features.shard: on
features.shard-block-size: 16MB
cluster.self-heal-window-size: 128
cluster.data-self-heal-algorithm: full
performance.write-behind: off
performance.strict-write-ordering: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
network.ping-timeout: 10
#####
Gluster status:
Status of volume: vlm01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gfs01:/bricks/b01/vlm01 49159 0 Y 3050
Brick gfs02:/bricks/b01/vlm01 49158 0 Y 3012
Brick gfs03:/bricks/b01/vlm01 49158 0 Y 3889
Brick gfs01:/bricks/b02/vlm01 49160 0 Y 3058
Brick gfs02:/bricks/b02/vlm01 49159 0 Y 3011
Brick gfs03:/bricks/b02/vlm01 49159 0 Y 3888
Brick gfs01:/bricks/b03/vlm01 49161 0 Y 3067
Brick gfs02:/bricks/b03/vlm01 49160 0 Y 3024
Brick gfs03:/bricks/b03/vlm01 49160 0 Y 3899
Brick gfs01:/bricks/b04/vlm01 49162 0 Y 3057
Brick gfs02:/bricks/b04/vlm01 49161 0 Y 3035
Brick gfs03:/bricks/b04/vlm01 49161 0 Y 3898
Brick gfs01:/bricks/b05/vlm01 49163 0 Y 3075
Brick gfs02:/bricks/b05/vlm01 49162 0 Y 3030
Brick gfs03:/bricks/b05/vlm01 49162 0 Y 3914
Brick gfs01:/bricks/b06/vlm01 49164 0 Y 3091
Brick gfs02:/bricks/b06/vlm01 49163 0 Y 3048
Brick gfs03:/bricks/b06/vlm01 49163 0 Y 3913
Brick gfs01:/bricks/b07/vlm01 49165 0 Y 3080
Brick gfs02:/bricks/b07/vlm01 49164 0 Y 3042
Brick gfs03:/bricks/b07/vlm01 49164 0 Y 3908
NFS Server on localhost 2049 0 Y 28926
Self-heal Daemon on localhost N/A N/A Y 28934
NFS Server on gfs02 2049 0 Y 9944
Self-heal Daemon on gfs02 N/A N/A Y 9953
NFS Server on gfs01 2049 0 Y 46993
Self-heal Daemon on gfs01 N/A N/A Y 47003
Task Status of Volume vlm01
------------------------------------------------------------------------------
There are no active volume tasks
#####
dmesg;
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: patchset: git://git.gluster.com/glusterfs.git
Jul 23 09:53:07 gfs03 nfs[31243]: signal received: 11
Jul 23 09:53:07 gfs03 nfs[31243]: time of crash:
Jul 23 09:53:07 gfs03 nfs[31243]: 2016-07-23 06:53:07
Jul 23 09:53:07 gfs03 nfs[31243]: configuration details:
Jul 23 09:53:07 gfs03 nfs[31243]: argp 1
Jul 23 09:53:07 gfs03 nfs[31243]: backtrace 1
Jul 23 09:53:07 gfs03 nfs[31243]: dlfcn 1
Jul 23 09:53:07 gfs03 nfs[31243]: libpthread 1
Jul 23 09:53:07 gfs03 nfs[31243]: llistxattr 1
Jul 23 09:53:07 gfs03 nfs[31243]: setfsid 1
Jul 23 09:53:07 gfs03 nfs[31243]: spinlock 1
Jul 23 09:53:07 gfs03 nfs[31243]: epoll.h 1
Jul 23 09:53:07 gfs03 nfs[31243]: xattr.h 1
Jul 23 09:53:07 gfs03 nfs[31243]: st_atim.tv_nsec 1
Jul 23 09:53:07 gfs03 nfs[31243]: package-string: glusterfs 3.7.13
#####
nfs.log;
[2016-07-23 05:59:19.961654] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-18: Connected to vlm01-client-18, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.961670] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-18: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.961717] I [MSGID: 108005] [afr-common.c:4142:afr_notify] 0-vlm01-replicate-6: Subvolume 'vlm01-client-18' came back up; going online.
[2016-07-23 05:59:19.961854] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-18: Server lk version = 1
[2016-07-23 05:59:19.962027] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-vlm01-client-20: changing port to 49164 (from 0)
[2016-07-23 05:59:19.964637] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vlm01-client-19: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-23 05:59:19.965956] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-19: Connected to vlm01-client-19, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.965989] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-19: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.966140] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-19: Server lk version = 1
[2016-07-23 05:59:19.967605] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vlm01-client-20: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-23 05:59:19.967919] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-20: Connected to vlm01-client-20, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.967943] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-20: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.968107] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-20: Server lk version = 1
[2016-07-23 05:59:19.973053] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-17: Connected to vlm01-client-17, attached to remote volume '/bricks/b06/vlm01'.
[2016-07-23 05:59:19.973081] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-17: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.973582] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-17: Server lk version = 1
[2016-07-23 05:59:19.974557] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-0: selecting local read_child vlm01-client-2
[2016-07-23 05:59:19.976100] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-1: selecting local read_child vlm01-client-5
[2016-07-23 05:59:19.976161] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-2: selecting local read_child vlm01-client-8
[2016-07-23 05:59:19.976583] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-3: selecting local read_child vlm01-client-11
[2016-07-23 05:59:19.976640] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-4: selecting local read_child vlm01-client-14
[2016-07-23 05:59:19.976676] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-5: selecting local read_child vlm01-client-17
[2016-07-23 05:59:19.976879] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-6: selecting local read_child vlm01-client-20
[2016-07-23 05:59:36.360646] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)
[2016-07-23 05:59:36.962314] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk~ (hash=vlm01-replicate-6/cache=vlm01-replicate-6) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk (hash=vlm01-replicate-3/cache=vlm01-replicate-6)
[2016-07-23 05:59:37.019564] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)
The message "I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)" repeated 2 times between [2016-07-23 05:59:37.019564] and [2016-07-23 05:59:37.421227]
[2016-07-23 05:59:38.757822] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)
[2016-07-23 05:59:39.950960] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk~ (hash=vlm01-replicate-5/cache=vlm01-replicate-5) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk (hash=vlm01-replicate-5/cache=vlm01-replicate-5)
[2016-07-23 06:00:03.048266] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk~ (hash=vlm01-replicate-2/cache=vlm01-replicate-2) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk (hash=vlm01-replicate-5/cache=vlm01-replicate-2)
[2016-07-23 06:00:07.994953] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 4/PRTG 4.vmx => (XID: 8439cb9c, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000
[2016-07-23 06:01:02.831132] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (192.168.208.85:676) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
[2016-07-23 06:16:48.221237] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-12: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists]
[2016-07-23 06:16:48.221231] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-13: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists]
[2016-07-23 06:16:48.221382] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-14: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists]
[2016-07-23 06:16:48.221878] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 => (XID: 8441a50a, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000
[2016-07-23 06:17:11.343148] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists]
[2016-07-23 06:17:11.343170] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists]
[2016-07-23 06:17:11.343234] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists]
[2016-07-23 06:17:11.343596] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 => (XID: 51e43a2f, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [Invalid argument]
[2016-07-23 06:17:21.393996] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
[2016-07-23 06:50:11.441462] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists]
[2016-07-23 06:50:11.441471] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists]
[2016-07-23 06:50:11.441530] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists]
[2016-07-23 06:50:11.441959] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 => (XID: 51ea9a6e, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000
[2016-07-23 06:50:21.712570] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-07-23 06:53:07
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.13
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f74cbde32f2]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f74cbe08aad]
/lib64/libc.so.6(+0x35670)[0x7f74ca4cf670]
/lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7f74cac50210]
I appreciate your help guys.
Respectfully
Mahdi A. Mahdi
Mahdi A. Mahdi
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users