Killing glusterfd on server for fun

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi 

I am stil testing high availability on release-3.3. I have a 2 replica
setup, and it is busy building NetBSD [1]. If I kill glusterfsd on one server, 
leaving the other one alive, it causes all ongoing operations to fail, 
and the mount is wrecked beyond recovery:

client$ ls
ls: .: Socket is not connected
client$ cd /
client$ cd -
/bin/ksh: cd: /pfs/manu/netbsd/usr/src - Socket is not connected

I have to restart the stopped brick in order to use the mount again.
I tried reproducing the issue with something simplier than a huge build,
without success so far: the mount survives a brick being stopped, and
I do not even have failures on ongoing operations. I would appreciate
some hint on how to reproduce the problem in a simple test case. 

Here is the begging of client log at failure time:
[2012-06-18 09:14:24.421370] W [socket.c:1512:__socket_proto_state_machine] 0-pfs-client-0: reading from socket failed. Error (Socket is not connected), peer (193.54.82.99:24010)
[2012-06-18 09:14:24.443002] E [rpc-clnt.c:373:saved_frames_unwind]  0-pfs-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-06-18 09:14:23.516562 (xid=0x9483261x)
[2012-06-18 09:14:24.443448] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-pfs-client-0: remote operation failed: Socket is not connected
[2012-06-18 09:14:24.443660] I [afr-lk-common.c:1006:afr_lock_blocking] 0-pfs-replicate-0: unable to lock on even one child
[2012-06-18 09:14:24.443735] I [afr-transaction.c:994:afr_post_blocking_inodelk_cbk] 0-pfs-replicate-0: Blocking inodelks failed.
[2012-06-18 09:14:24.443903] W [fuse-bridge.c:788:fuse_setattr_cbk] 0-glusterfs-fuse: 7435303: SETATTR() /manu/netbsd/usr/src/destdir.i386/usr/include/sys/featuretest.h => -1 (Socket is not connected)
[2012-06-18 09:14:24.445857] I [socket.c:2315:socket_submit_request] 0-pfs-client-0: not connected (priv->connected = 0)
[2012-06-18 09:14:24.446190] W [rpc-clnt.c:1498:rpc_clnt_submit] 0-pfs-client-0: failed to submit rpc-request (XID: 0x9483576x Program: GlusterFS 3.1, ProgVers: 330, Proc: 41) to rpc-transport (pfs-client-0)
[2012-06-18 09:14:24.446665] E [rpc-clnt.c:373:saved_frames_unwind]  0-pfs-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-06-18 09:14:23.517840 (xid=0x9483266x)
[2012-06-18 09:14:24.447066] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-pfs-client-0: remote operation failed: Socket is not connected
[2012-06-18 09:14:24.447364] I [afr-lk-common.c:1006:afr_lock_blocking] 0-pfs-replicate-0: unable to lock on even one child
[2012-06-18 09:14:24.447666] I [afr-transaction.c:994:afr_post_blocking_inodelk_cbk] 0-pfs-replicate-0: Blocking inodelks failed.
[2012-06-18 09:14:24.448142] W [fuse-bridge.c:788:fuse_setattr_cbk] 0-glusterfs-fuse: 7435323: SETATTR() /manu/netbsd/usr/src/destdir.i386/usr/include/sys/featuretest.h => -1 (Socket is not connected)


 
[1] for anyone willing to reproduce, get and unpack
gnusrc.tgz src.tgz syssrc.tgzsharesrc.tgz from
ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-5.1.2/source/sets/
Then: cd usr/src && ./build.sh -Uuo release

-- 
Emmanuel Dreyfus
manu@xxxxxxxxxx



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux