Hi I hit another rare problem, which seems replicable within 4 hours of usage: one brick goes down, but not completely. It will not create a file, for instance, but it will participate in file locking and cause it to fail, because it did not create the file. Here is the final symptom (this create the file then locks it) client# echo "xxx"|cat -l > /gfs/foo cat: stdout: No such file or directory birck1# ls -l /export/gfs1/foo -rw-r--r-- 2 root wheel 0 Jul 29 06:18 /export/gfs1/foo brick2# ls -l /export/gfs1/foo ls: /export/gfs1/foo: No such file or directory client log for this operation: [2012-07-29 06:18:10.430637] W [client3_1-fops.c:2186:client3_1_lk_cbk] 0-gfs-client-1: remote operation failed: No such file or directory [2012-07-29 06:18:10.431628] W [fuse-bridge.c:3196:fuse_setlk_cbk] 0-glusterfs-fuse: 11781877: ERR => -1 (No such file or directory) [2012-07-29 06:18:10.434844] W [client3_1-fops.c:2186:client3_1_lk_cbk] 0-gfs-client-1: remote operation failed: No such file or directory [2012-07-29 06:18:10.435939] W [fuse-bridge.c:3196:fuse_setlk_cbk] 0-glusterfs-fuse: 11781880: ERR => -1 (No such file or directory) brick1 logs nothing. brick2 log for this operation: [2012-07-29 06:18:10.430151] I [server3_1-fops.c:203:server_lk_cbk] 0-gfs-server: 2017229: LK -2 (--) ==> -1 (No such file or directory) [2012-07-29 06:18:10.434281] I [server3_1-fops.c:203:server_lk_cbk] 0-gfs-server: 2017231: LK -2 (--) ==> -1 (No such file or directory) But this is only the conseuence of an earlier problem, where brick2 went half-offline. Enough to refuse creating files, not not enough to be excluded from locking operation. Here is how it happened: brick2 log [2012-07-28 22:30:08.024578] E [event.c:346:event_dispatch_poll_handler] 0-poll: index not found for fd=15 (idx_hint=6) [2012-07-28 22:30:18.418768] I [server-handshake.c:571:server_setvolume] 0-gfs-server: accepted client from client-18310-2012/07/27-03:03:28:140183437669610-gfs-client-1-0 (version: 3.3git) client log [2012-07-28 22:30:08.026975] W [socket.c:1512:__socket_proto_state_machine] 0-gfs-client-1: reading from socket failed. Error (Socket is not connected), peer (192.0.2.98:24010) [2012-07-28 22:30:08.027050] E [rpc-clnt.c:373:saved_frames_unwind] 0-gfs-client-1: forced unwinding frame type(GlusterFS 3.1) op(WRITE(13)) called at 2012-07-28 22:30:08.026783 (xid=0x1990324x) [2012-07-28 22:30:08.027224] W [client3_1-fops.c:821:client3_1_writev_cbk] 0-gfs-client-1: remote operation failed: Socket is not connected [2012-07-28 22:30:08.027396] I [client.c:2090:client_rpc_notify] 0-gfs-client-1: disconnected [2012-07-28 22:30:08.027553] W [client3_1-fops.c:4929:client3_1_fxattrop] 0-gfs-client-1: (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1. EBADFD [2012-07-28 22:30:08.030501] W [client3_1-fops.c:5306:client3_1_finodelk] 0-gfs-client-1: (366a4c92-d167-48e7-844a-9dc43602ecc5) remote_fd is -1. EBADFD [2012-07-28 22:30:18.419800] I [client-handshake.c:1636:select_server_supported_programs] 0-gfs-client-1: Using Program GlusterFS 3.3git, Num (1298437), Version (330) [2012-07-28 22:30:18.420716] I [client-handshake.c:1433:client_setvolume_cbk] 0-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume '/export/gfs1'. [2012-07-28 22:30:18.420768] I [client-handshake.c:1454:client_setvolume_cbk] 0-gfs-client-1: Server and Client lk-version numbers are same, no need to reopen the fds We are said by both client and server that reconnexion was done without a hitch, but it seems glusterfs did not really recovered. -- Emmanuel Dreyfus manu@xxxxxxxxxx