Hi RafiKC,
Thanks for replying I've attached a tar.gz file with code and repro.txt to help reproduce the issue
One point to note as well was our original mount options was as follow
- mount -t glusterfs -o acl <host>:/volume1 /mnt/...
Introducing the entry_timeout option mount -t glusterfs -o acl,entry-timeout=0 (entry_timeout from the ubuntu fuse documentation The timeout in seconds for which name lookups will be cached.) does away with the stale file handle error. I suppose this comes with a performance hit, any insight on this is from gluster's point is appreciated as well.
Log snippet from brick/data-glusterfs-volume1-brick1-brick.log (please let me know if there was another log file that I can get information)
[2016-03-24 00:23:53.718403] W [server-resolve.c:437:resolve_anonfd_simple] 0-server: inode for the gfid (2af5d8c7-2321-4d77-bda9-ad883ae8d230) is not found. anonymous fd creation failed
[2016-03-24 00:23:53.718543] W [server-resolve.c:437:resolve_anonfd_simple] 0-server: inode for the gfid (2af5d8c7-2321-4d77-bda9-ad883ae8d230) is not found. anonymous fd creation failed
[2016-03-24 00:23:53.718573] I [server-rpc-fops.c:1235:server_fstat_cbk] 0-volume1-server: 15309: FSTAT -2 (2af5d8c7-2321-4d77-bda9-ad883ae8d230) ==> (No such file or directory)
[2016-03-24 00:24:09.679523] W [server-resolve.c:437:resolve_anonfd_simple] 0-server: inode for the gfid (02b2e084-261b-45c5-8283-d7babd219a4d) is not found. anonymous fd creation failed
[2016-03-24 00:24:09.679602] W [server-resolve.c:437:resolve_anonfd_simple] 0-server: inode for the gfid (02b2e084-261b-45c5-8283-d7babd219a4d) is not found. anonymous fd creation failed
[2016-03-24 00:24:09.679618] I [server-rpc-fops.c:1235:server_fstat_cbk] 0-volume1-server: 23548: FSTAT -2 (02b2e084-261b-45c5-8283-d7babd219a4d) ==> (No such file or directory)
[2016-03-24 00:24:53.789620] W [server-resolve.c:437:resolve_anonfd_simple] 0-server: inode for the gfid (74b2fd0e-966a-40fd-9554-bf30e2c0b7ea) is not found. anonymous fd creation failed
[2016-03-24 00:24:53.789694] W [server-resolve.c:437:resolve_anonfd_simple] 0-server: inode for the gfid (74b2fd0e-966a-40fd-9554-bf30e2c0b7ea) is not found. anonymous fd creation failed
[2016-03-24 00:24:53.789709] I [server-rpc-fops.c:1235:server_fstat_cbk] 0-volume1-server: 38694: FSTAT -2 (74b2fd0e-966a-40fd-9554-bf30e2c0b7ea) ==> (No such file or directory)
===
Log snippet from /var/log/glusterfs/mnt-repovolume1.log (on client side)
[2016-03-24 00:24:53.747930] I [dht-rename.c:1344:dht_rename] 0-volume1-dht: renaming /hhh/master.lock (hash=volume1-replicate-0/cache=volume1-replicate-0) => /hhh/master (hash=volume1-replicate-0/cache=volume1-replicate-0)
[2016-03-24 00:24:53.760365] I [dht-rename.c:1344:dht_rename] 0-volume1-dht: renaming /hhh/master.lock (hash=volume1-replicate-0/cache=volume1-replicate-0) => /hhh/master (hash=volume1-replicate-0/cache=volume1-replicate-0)
[2016-03-24 00:24:53.778746] W [client-rpc-fops.c:1472:client3_3_fstat_cbk] 0-volume1-client-1: remote operation failed: No such file or directory
[2016-03-24 00:24:53.779381] W [MSGID: 108008] [afr-read-txn.c:225:afr_read_txn] 0-volume1-replicate-0: Unreadable subvolume -1 found with event generation 2. (Possible split-brain)
[2016-03-24 00:24:53.779816] E [dht-helper.c:940:dht_migration_complete_check_task] 0-volume1-dht: (null): failed to get the 'linkto' xattr Stale file handle
Log snippet on my program
[master] Size: 64 bytes. nlink 1, Inode: 8be6731fd9f9fbe4 File Permissions:-rw-r--r--commit() filename = master, lock_filename = master.lock
read(master) :: Stale file handle
Thanks
Rama
On Tue, Mar 22, 2016 at 11:25 PM, Mohammed Rafi K C <rkavunga@xxxxxxxxxx> wrote:
On 03/23/2016 04:10 AM, Rama Shenai wrote:
Hi, We had some questions with respect to expectations of atomicity of rename in gluster.
To elaborate :
We have setup with two machines (lets call them M1 and M2) which essentially access a file (F) on a gluster volume (mounted by M1 and M2)A program does the following steps sequentially on each of the two machines (M1 & M2) in an infinite loop1) Opens the file F in O_RDWR|O_EXCL mode, reads some data and closes (F)2) Renames some other file F' => FPeriodically either M1 or M2 sees a "Stalefile handle error" when it tries to read the file (step (1)) after opening the file in O_RDWR|O_EXCL (the open is successful)
The specific error reported the client volume logs (/var/log/glusterfs/mnt-repos-volume1.log)[2016-03-21 16:53:17.897902] I [dht-rename.c:1344:dht_rename] 0-volume1-dht: renaming master.lock (hash=volume1-replicate-0/cache=volume1-replicate-0) => master (hash=volume1-replicate-0/cache=<nul>)[2016-03-21 16:53:18.735090] W [client-rpc-fops.c:504:client3_3_stat_cbk] 0-volume1-client-0: remote operation failed: Stale file handle
Hi Rama,
ESTALE error in rename normally generated when either the source file is not resolvable (deleted or inaccessible) or when the parent of destination is not resolvable. It can happen when let's say file F' was present when your application did a lookup before rename, but if it is got renamed by Node M1 before M2 could rename it. Basically a race between two rename on the same file can result in ESTALE for either of one.
To confirm this, Can you please paste the log message from brick "0-volume1-client-0". You can find out the brick name from the graph.
Also if you can share the program or snippet that used to reproduce this issue, that would be great.
Rafi KC
We see no error when: have two processes of the above program running on the same machine (say on M1) accessing the file F on the gluster volume, for which we want to understand the expectations of atomicity in gluster specifically specifically for rename, and if the above is a bug.
Also glusterfs --version => glusterfs 3.6.9 built on Mar 2 2016 18:21:14
We also would like to know if there any parameter in the one translators that we can tweak to prevent this problem
Any help or insights here is appreciated
ThanksRama
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
Attachment:
stale-repro.tar.gz
Description: GNU Zip compressed data
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users