Bricks crashing in 3.7.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
yesterday I’ve got a strange crash on almost all bricks, same type of crash, repeated:

[2015-06-09 18:23:56.407520] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c3deedb5-893f-41fb-8c33-9ae23a0e9d27
[2015-06-09 18:23:56.407580] I [server-handshake.c:585:server_setvolume] 0-atlas-data-01-server: accepted client from atlas-storage-10.roma1.infn.it-7546-2015/06/09-18:23:55:618600-atlas-data-01-client-0-0-0 (version: 3.7.1)
[2015-06-09 18:23:56.407707] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c3deedb5-893f-41fb-8c33-9ae23a0e9d27
[2015-06-09 18:23:56.407772] I [server-handshake.c:585:server_setvolume] 0-atlas-data-01-server: accepted client from atlas-storage-09.roma1.infn.it-25429-2015/06/09-18:18:57:328935-atlas-data-01-client-0-0-0 (version: 3.7.1)
[2015-06-09 18:23:56.415905] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c3deedb5-893f-41fb-8c33-9ae23a0e9d27
[2015-06-09 18:23:56.415947] I [server-handshake.c:585:server_setvolume] 0-atlas-data-01-server: accepted client from atlas-storage-10.roma1.infn.it-7530-2015/06/09-18:23:54:608880-atlas-data-01-client-0-0-0 (version: 3.7.1)
[2015-06-09 18:23:56.433956] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-data-01-posix: could not read the link from the gfid handle /bricks/atlas/data01/data/.glusterfs/74/4b/744b7cf0-258f-4dea-b4d9-7001bb21ca56 (No such file or directory)
[2015-06-09 18:23:56.433954] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-data-01-posix: could not read the link from the gfid handle /bricks/atlas/data01/data/.glusterfs/74/4b/744b7cf0-258f-4dea-b4d9-7001bb21ca56 (No such file or directory)
pending frames:
frame : type(0) op(11)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-06-09 18:23:56
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.1
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f0f6446ed92]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f0f644899ed]
/lib64/libc.so.6(+0x35650)[0x7f0f62e60650]
/usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so(upcall_cache_invalidate+0xb5)[0x7f0f5537cab5]
/usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so(up_readdir_cbk+0x1a2)[0x7f0f55376292]
/usr/lib64/glusterfs/3.7.1/xlator/features/locks.so(pl_readdirp_cbk+0x164)[0x7f0f5558dc94]
/usr/lib64/glusterfs/3.7.1/xlator/features/access-control.so(posix_acl_readdirp_cbk+0x299)[0x7f0f557a6829]
/usr/lib64/glusterfs/3.7.1/xlator/features/bitrot-stub.so(br_stub_readdirp_cbk+0x181)[0x7f0f559b5fb1]
/usr/lib64/glusterfs/3.7.1/xlator/storage/posix.so(posix_readdirp+0x143)[0x7f0f56f0cfc3]
/lib64/libglusterfs.so.0(default_readdirp+0x75)[0x7f0f644736a5]
/lib64/libglusterfs.so.0(default_readdirp+0x75)[0x7f0f644736a5]
/lib64/libglusterfs.so.0(default_readdirp+0x75)[0x7f0f644736a5]
/usr/lib64/glusterfs/3.7.1/xlator/features/bitrot-stub.so(br_stub_readdirp+0x246)[0x7f0f559b0d46]
/usr/lib64/glusterfs/3.7.1/xlator/features/access-control.so(posix_acl_readdirp+0x18d)[0x7f0f557a45cd]
/usr/lib64/glusterfs/3.7.1/xlator/features/locks.so(pl_readdirp+0x14e)[0x7f0f5558c7ee]
/usr/lib64/glusterfs/3.7.1/xlator/features/upcall.so(up_readdirp+0x17a)[0x7f0f5537abfa]
/lib64/libglusterfs.so.0(default_readdirp_resume+0x134)[0x7f0f644809e4]
/lib64/libglusterfs.so.0(call_resume+0x7d)[0x7f0f64498c7d]
/usr/lib64/glusterfs/3.7.1/xlator/performance/io-threads.so(iot_worker+0x123)[0x7f0f5516b353]
/lib64/libpthread.so.0(+0x7df5)[0x7f0f635dadf5]
/lib64/libc.so.6(clone+0x6d)[0x7f0f62f211ad]
---------


I’m not sure if the missing file is the culprit, but if it is the cause, how can I solve it? For the moment I’ve recreated the bricks from a backup, so I’m fine, but it would be nice to understand what to do in case it happens again. I still have the contents of the old crashed brick.
The crash was happening every time I restarted glusterd, in the same way.
I’m using gluster 3.7.1 on CentOS 7.1, with the following kind of configuration:

# gluster volume info atlas-data-01
 
Volume Name: atlas-data-01
Type: Replicate
Volume ID: 854620a1-3e88-4e76-91ce-486996bf6a12
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/bricks/atlas/data01/data
Brick2: node2:/bricks/atlas/data01/data
Brick3: node3:/bricks/atlas/data02/data
Options Reconfigured:
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on
nfs.disable: true
server.allow-insecure: on
ganesha.enable: off
nfs-ganesha: disable


I was playing with ganesha and tried to enable in on the volumes (but failed, as you can see from my other messages), and I’m not sure it is related, but all the crashed bricks were the ones belonging to the volumes where I tried to enable ganesha.
Thanks,

	Alessandro

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux