On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:
On 07/18/2014 07:57 PM, Anders Blomdell wrote:
During testing of a 3*4 gluster (from master as of yesterday), I encountered
two major weirdnesses:
1. A 'rm -rf <some_dir>' needed several invocations to finish, each time
reporting a number of lines like these:
rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty
This is reproducible for me when running dbench on nfs mounts. I think I may have seen it on glusterfs mounts as well but it seems more reproducible on nfs. I should have caught it sooner but it doesn't error out client side when cleaning up, and the next test I run the deletes are successful. When this happens in the nfs.log I see:
This spams the log, from what I can tell it happens when dbench is creating the files:
[2014-07-19 13:37:03.271651] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ],
Then when the deletes fail I see the following when the client is removing the files:
[2014-07-18 23:31:44.272465] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 74a6541a: /run8063_dbench/clients => -1 (Directory not empty)
.
.
[2014-07-18 23:31:44.452988] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7ea9541a: /run8063_dbench/clients => -1 (Directory not empty)
[2014-07-18 23:31:45.262651] W [client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote operation failed: Stale file handle
[2014-07-18 23:31:45.263151] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with e
vent generation 2. (Possible split-brain)
[2014-07-18 23:31:45.264196] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 32ac541a: <gfid:b073a189-91ea-46b2-b757-5b320591b848> => -1 (Stale fi
le handle)
[2014-07-18 23:31:45.264217] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 32ac541a, ACCESS: NFS: 70(Invalid file handle), P
OSIX: 116(Stale file handle)
[2014-07-18 23:31:45.266818] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 33ac541a: <gfid:b073a189-91ea-46b2-b757-5b320591b848> => -1 (Stale fi
le handle)
[2014-07-18 23:31:45.266853] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 33ac541a, ACCESS: NFS: 70(Invalid file handle), P
OSIX: 116(Stale file handle)
Occasionally I see:
[2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available)
[2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (-->/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this
I'm opening a BZ now, I'll leave systems up and put the repro steps + hostnames in the BZ in case anyone wants to poke around.
-b
What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this.
2. After having successfully deleted all files from the volume,
i have a single directory that is duplicated in gluster-fuse,
like this:
# ls -l /mnt/gluster
total 24
drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/
any idea on how to debug this issue?
I have not seen this but I am running on a 6x2 volume. I wonder if this may only happen with replica > 2?
Pranith
/Anders
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel