Re: glusterfs client crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have experienced what looks like a very similar crash. Gluster 3.7.6 on CentOS 7. No errors on the bricks or on other at the time mounted clients. Relatively high load at the time.

Remounting the filesystem brought it back online.

pending frames:
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash:
2016-02-22 10:28:45
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.6
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f83387f7012]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f83388134dd]
/lib64/libc.so.6(+0x35670)[0x7f8336ee5670]
/lib64/libc.so.6(gsignal+0x37)[0x7f8336ee55f7]
/lib64/libc.so.6(abort+0x148)[0x7f8336ee6ce8]
/lib64/libc.so.6(+0x75317)[0x7f8336f25317]
/lib64/libc.so.6(+0x7cfe1)[0x7f8336f2cfe1]
/lib64/libglusterfs.so.0(loc_wipe+0x27)[0x7f83387f4d47]
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_local_wipe+0x11)[0x7f8329c8e5f1]
/usr/lib64/glusterfs/3.7.6/xlator/performance/md-cache.so(mdc_stat_cbk+0x10c)[0x7f8329c8f4fc]
/lib64/libglusterfs.so.0(default_stat_cbk+0xac)[0x7f83387fcc5c]
/usr/lib64/glusterfs/3.7.6/xlator/cluster/distribute.so(dht_file_attr_cbk+0x149)[0x7f832ab2a409]
/usr/lib64/glusterfs/3.7.6/xlator/protocol/client.so(client3_3_stat_cbk+0x3c6)[0x7f832ad6d266]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7f83385c5b80]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7f83385c5e3f]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f83385c1983]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0x9506)[0x7f832d261506]
/usr/lib64/glusterfs/3.7.6/rpc-transport/socket.so(+0xc3f4)[0x7f832d2643f4]
/lib64/libglusterfs.so.0(+0x878ea)[0x7f83388588ea]
/lib64/libpthread.so.0(+0x7dc5)[0x7f833765fdc5]
/lib64/libc.so.6(clone+0x6d)[0x7f8336fa621d]

Kind regards,
Fredrik Widlund


On Tue, Feb 23, 2016 at 1:00 PM, <gluster-users-request@xxxxxxxxxxx> wrote:
Date: Mon, 22 Feb 2016 15:08:47 -0500
From: Dj Merrill <gluster@xxxxxxxx>
To: Gaurav Garg <ggarg@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Subject: Re: glusterfs client crashes
Message-ID: <56CB6ACF.5080408@xxxxxxxx>
Content-Type: text/plain; charset=utf-8; format=flowed

On 2/21/2016 2:23 PM, Dj Merrill wrote:
 > Very interesting.  They were reporting both bricks offline, but the
 > processes on both servers were still running.  Restarting glusterfsd on
 > one of the servers brought them both back online.

I realize I wasn't clear in my comments yesterday and would like to
elaborate on this a bit further. The "very interesting" comment was
sparked because when we were running 3.7.6, the bricks were not
reporting as offline when a client was having an issue, so this is new
behaviour now that we are running 3.7.8 (or a different issue entirely).

The other point that I was not clear on is that we may have one client
reporting the "Transport endpoint is not connected" error, but the other
40+ clients all continue to work properly. This is the case with both
3.7.6 and 3.7.8.

Curious, how can the other clients continue to work fine if both Gluster
3.7.8 servers are reporting the bricks as offline?

What does "offline" mean in this context?


Re: the server logs, here is what I've found so far listed on both
gluster servers (glusterfs1 and glusterfs2):

[2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv]
0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
data available)
[2016-02-21 18:48:20.677096] I [MSGID: 114018]
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
gv0-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-02-21 18:48:31.148564] E [MSGID: 114058]
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs:
readv on (sanitized IP of glusterfs2):24007 failed (No data available)
[2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec]
0-mgmt: Volume file changed
[2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
0-gv0-client-1: changing port to 49152 (from 0)
[2016-02-21 18:48:53.051991] I [MSGID: 114057]
[client-handshake.c:1437:select_server_supported_programs]
0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-02-21 18:48:53.052439] I [MSGID: 114046]
[client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected
to gv0-client-1, attached to remote volume '/export/brick1/sdb1'.
[2016-02-21 18:48:53.052486] I [MSGID: 114047]
[client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server
and Client lk-version numbers are not same, reopening the fds
[2016-02-21 18:48:53.052668] I [MSGID: 114035]
[client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1:
Server lk version = 1
[2016-02-21 18:48:31.148706] I [MSGID: 114018]
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
gv0-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs:
readv on (sanitized IP of glusterfs2):24007 failed (No data available)
[2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv]
0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No
data available)
[2016-02-21 18:49:15.637824] I [MSGID: 114018]
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from
gv0-client-1. Client process will keep trying to connect to glusterd
until brick's port is available
[2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish]
0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed
(Connection refused)
[2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish]
0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed
(Connection refused)
[2016-02-21 18:49:38.366559] I [MSGID: 108031]
[afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting
local read_child gv0-client-0
[2016-02-21 18:50:54.605535] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:50:54.605639] E [MSGID: 114058]
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux