Gluster nfs crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, i have im having issues with gluster nfs, it keep crashing after few hours under medium load.

OS: CentOS 7.2
Gluster version 3.7.13

Gluster info;
Volume Name: vlm01
Type: Distributed-Replicate
Volume ID: eacd8248-dca3-4530-9aed-7714a5a114f2
Status: Started
Number of Bricks: 7 x 3 = 21
Transport-type: tcp
Bricks:
Brick1: gfs01:/bricks/b01/vlm01
Brick2: gfs02:/bricks/b01/vlm01
Brick3: gfs03:/bricks/b01/vlm01
Brick4: gfs01:/bricks/b02/vlm01
Brick5: gfs02:/bricks/b02/vlm01
Brick6: gfs03:/bricks/b02/vlm01
Brick7: gfs01:/bricks/b03/vlm01
Brick8: gfs02:/bricks/b03/vlm01
Brick9: gfs03:/bricks/b03/vlm01
Brick10: gfs01:/bricks/b04/vlm01
Brick11: gfs02:/bricks/b04/vlm01
Brick12: gfs03:/bricks/b04/vlm01
Brick13: gfs01:/bricks/b05/vlm01
Brick14: gfs02:/bricks/b05/vlm01
Brick15: gfs03:/bricks/b05/vlm01
Brick16: gfs01:/bricks/b06/vlm01
Brick17: gfs02:/bricks/b06/vlm01
Brick18: gfs03:/bricks/b06/vlm01
Brick19: gfs01:/bricks/b07/vlm01
Brick20: gfs02:/bricks/b07/vlm01
Brick21: gfs03:/bricks/b07/vlm01
Options Reconfigured:
auth.allow: 192.168.221.50,192.168.221.51,192.168.221.52,192.168.221.56
features.shard: on
features.shard-block-size: 16MB
cluster.self-heal-window-size: 128
cluster.data-self-heal-algorithm: full
performance.write-behind: off
performance.strict-write-ordering: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
network.ping-timeout: 10
#####


Gluster status:
Status of volume: vlm01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs01:/bricks/b01/vlm01               49159     0          Y       3050 
Brick gfs02:/bricks/b01/vlm01               49158     0          Y       3012 
Brick gfs03:/bricks/b01/vlm01               49158     0          Y       3889 
Brick gfs01:/bricks/b02/vlm01               49160     0          Y       3058 
Brick gfs02:/bricks/b02/vlm01               49159     0          Y       3011 
Brick gfs03:/bricks/b02/vlm01               49159     0          Y       3888 
Brick gfs01:/bricks/b03/vlm01               49161     0          Y       3067 
Brick gfs02:/bricks/b03/vlm01               49160     0          Y       3024 
Brick gfs03:/bricks/b03/vlm01               49160     0          Y       3899 
Brick gfs01:/bricks/b04/vlm01               49162     0          Y       3057 
Brick gfs02:/bricks/b04/vlm01               49161     0          Y       3035 
Brick gfs03:/bricks/b04/vlm01               49161     0          Y       3898 
Brick gfs01:/bricks/b05/vlm01               49163     0          Y       3075 
Brick gfs02:/bricks/b05/vlm01               49162     0          Y       3030 
Brick gfs03:/bricks/b05/vlm01               49162     0          Y       3914 
Brick gfs01:/bricks/b06/vlm01               49164     0          Y       3091 
Brick gfs02:/bricks/b06/vlm01               49163     0          Y       3048 
Brick gfs03:/bricks/b06/vlm01               49163     0          Y       3913 
Brick gfs01:/bricks/b07/vlm01               49165     0          Y       3080 
Brick gfs02:/bricks/b07/vlm01               49164     0          Y       3042 
Brick gfs03:/bricks/b07/vlm01               49164     0          Y       3908 
NFS Server on localhost                     2049      0          Y       28926
Self-heal Daemon on localhost               N/A       N/A        Y       28934
NFS Server on gfs02                         2049      0          Y       9944 
Self-heal Daemon on gfs02                   N/A       N/A        Y       9953 
NFS Server on gfs01                         2049      0          Y       46993
Self-heal Daemon on gfs01                   N/A       N/A        Y       47003
 
Task Status of Volume vlm01
------------------------------------------------------------------------------
There are no active volume tasks
#####


dmesg;
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: frame : type(0) op(0)
Jul 23 09:53:07 gfs03 nfs[31243]: patchset: git://git.gluster.com/glusterfs.git
Jul 23 09:53:07 gfs03 nfs[31243]: signal received: 11
Jul 23 09:53:07 gfs03 nfs[31243]: time of crash:
Jul 23 09:53:07 gfs03 nfs[31243]: 2016-07-23 06:53:07
Jul 23 09:53:07 gfs03 nfs[31243]: configuration details:
Jul 23 09:53:07 gfs03 nfs[31243]: argp 1
Jul 23 09:53:07 gfs03 nfs[31243]: backtrace 1
Jul 23 09:53:07 gfs03 nfs[31243]: dlfcn 1
Jul 23 09:53:07 gfs03 nfs[31243]: libpthread 1
Jul 23 09:53:07 gfs03 nfs[31243]: llistxattr 1
Jul 23 09:53:07 gfs03 nfs[31243]: setfsid 1
Jul 23 09:53:07 gfs03 nfs[31243]: spinlock 1
Jul 23 09:53:07 gfs03 nfs[31243]: epoll.h 1
Jul 23 09:53:07 gfs03 nfs[31243]: xattr.h 1
Jul 23 09:53:07 gfs03 nfs[31243]: st_atim.tv_nsec 1
Jul 23 09:53:07 gfs03 nfs[31243]: package-string: glusterfs 3.7.13
#####


nfs.log;
[2016-07-23 05:59:19.961654] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-18: Connected to vlm01-client-18, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.961670] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-18: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.961717] I [MSGID: 108005] [afr-common.c:4142:afr_notify] 0-vlm01-replicate-6: Subvolume 'vlm01-client-18' came back up; going online.
[2016-07-23 05:59:19.961854] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-18: Server lk version = 1
[2016-07-23 05:59:19.962027] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-vlm01-client-20: changing port to 49164 (from 0)
[2016-07-23 05:59:19.964637] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vlm01-client-19: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-23 05:59:19.965956] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-19: Connected to vlm01-client-19, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.965989] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-19: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.966140] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-19: Server lk version = 1
[2016-07-23 05:59:19.967605] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vlm01-client-20: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-23 05:59:19.967919] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-20: Connected to vlm01-client-20, attached to remote volume '/bricks/b07/vlm01'.
[2016-07-23 05:59:19.967943] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-20: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.968107] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-20: Server lk version = 1
[2016-07-23 05:59:19.973053] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vlm01-client-17: Connected to vlm01-client-17, attached to remote volume '/bricks/b06/vlm01'.
[2016-07-23 05:59:19.973081] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vlm01-client-17: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-23 05:59:19.973582] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vlm01-client-17: Server lk version = 1
[2016-07-23 05:59:19.974557] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-0: selecting local read_child vlm01-client-2
[2016-07-23 05:59:19.976100] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-1: selecting local read_child vlm01-client-5
[2016-07-23 05:59:19.976161] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-2: selecting local read_child vlm01-client-8
[2016-07-23 05:59:19.976583] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-3: selecting local read_child vlm01-client-11
[2016-07-23 05:59:19.976640] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-4: selecting local read_child vlm01-client-14
[2016-07-23 05:59:19.976676] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-5: selecting local read_child vlm01-client-17
[2016-07-23 05:59:19.976879] I [MSGID: 108031] [afr-common.c:1913:afr_local_discovery_cbk] 0-vlm01-replicate-6: selecting local read_child vlm01-client-20
[2016-07-23 05:59:36.360646] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)
[2016-07-23 05:59:36.962314] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk~ (hash=vlm01-replicate-6/cache=vlm01-replicate-6) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15_1.vmdk (hash=vlm01-replicate-3/cache=vlm01-replicate-6)
[2016-07-23 05:59:37.019564] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)
The message "I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)" repeated 2 times between [2016-07-23 05:59:37.019564] and [2016-07-23 05:59:37.421227]
[2016-07-23 05:59:38.757822] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx~ (hash=vlm01-replicate-0/cache=vlm01-replicate-0) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmx (hash=vlm01-replicate-2/cache=vlm01-replicate-0)
[2016-07-23 05:59:39.950960] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk~ (hash=vlm01-replicate-5/cache=vlm01-replicate-5) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 40/PRTG 40.vmdk (hash=vlm01-replicate-5/cache=vlm01-replicate-5)
[2016-07-23 06:00:03.048266] I [MSGID: 109066] [dht-rename.c:1568:dht_rename] 0-vlm01-dht: renaming <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk~ (hash=vlm01-replicate-2/cache=vlm01-replicate-2) => <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 15/PRTG 15.vmdk (hash=vlm01-replicate-5/cache=vlm01-replicate-2)
[2016-07-23 06:00:07.994953] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/PRTG 4/PRTG 4.vmx => (XID: 8439cb9c, LOOKUP: NFS: 70(Invalid file handle), POSIX: 116(Stale file handle)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000
[2016-07-23 06:01:02.831132] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (192.168.208.85:676) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
[2016-07-23 06:16:48.221237] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-12: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists]
[2016-07-23 06:16:48.221231] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-13: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists]
[2016-07-23 06:16:48.221382] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-14: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 [File exists]
[2016-07-23 06:16:48.221878] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.lck-c7607978ef6c5b99 => (XID: 8441a50a, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000
[2016-07-23 06:17:11.343148] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists]
[2016-07-23 06:17:11.343170] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists]
[2016-07-23 06:17:11.343234] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 [File exists]
[2016-07-23 06:17:11.343596] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-aa088f801cb21f94 => (XID: 51e43a2f, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [Invalid argument]
[2016-07-23 06:17:21.393996] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
[2016-07-23 06:50:11.441462] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-19: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists]
[2016-07-23 06:50:11.441471] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-18: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists]
[2016-07-23 06:50:11.441530] W [MSGID: 114031] [client-rpc-fops.c:2402:client3_3_create_cbk] 0-vlm01-client-20: remote operation failed. Path: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 [File exists]
[2016-07-23 06:50:11.441959] W [MSGID: 112199] [nfs3-helpers.c:3520:nfs3_log_newfh_res] 0-nfs-nfsv3: <gfid:d61aff47-e715-489d-9908-f824789e27b3>/.iorm.sf/.lck-7a24a36f7bded3b0 => (XID: 51ea9a6e, CREATE: NFS: 17(File exists), POSIX: 17(File exists)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000
[2016-07-23 06:50:21.712570] E [MSGID: 112069] [nfs3.c:3483:nfs3_remove_resume] 0-nfs-nfsv3: No such file or directory: (192.168.208.86:906) vlm01 : a0d6a061-866e-4b75-b3ab-4005e52ed364
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2016-07-23 06:53:07
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.13
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f74cbde32f2]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f74cbe08aad]
/lib64/libc.so.6(+0x35670)[0x7f74ca4cf670]
/lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x7f74cac50210]


I appreciate your help guys.

Respectfully
Mahdi A. Mahdi


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux