Hi gluster developers, I have encountered a situation where a file can not be found, but it does exist and it is on the correct node. The file can be stat()-ed but not opened. After a Gluster restart the file is accessable again. Glusterfs: 3.0.3 with altered hashing function (by me). == On the Gluster mounted volume: archive at cgmarchive0:~/archive/incoming$ ls -l www.funkyfish.nl#59493#cgmspider0 -rw-rw-r-- 1 archive archive 599065 Mar 30 15:16 www.funkyfish.nl#59493#cgmspider0 archive at cgmarchive0:~/archive/incoming$ wc -l www.funkyfish.nl#59493#cgmspider0 wc: www.funkyfish.nl#59493#cgmspider0: No such file or directory == On the local (node0) volume archive at cgmarchive0:/local.mnt/md0/glfs-data/incoming$ ls -l www.funkyfish.nl#59493#cgmspider0 -rw-rw-r-- 1 vagabond vagabondo 599065 Mar 30 15:16 www.funkyfish.nl#59493#cgmspider0 archive at cgmarchive0:/local.mnt/md0/glfs-data/incoming$ wc -l www.funkyfish.nl#59493#cgmspider0 10767 www.funkyfish.nl#59493#cgmspider0 == Error log [2010-03-31 12:02:47] D [dht-common.c:1590:dht_fd_cbk] dht: subvolume node0 returned -1 (No such file or directory) [2010-03-31 12:02:47] W [fuse-bridge.c:858:fuse_fd_cbk] glusterfs-fuse: 10346982: OPEN() /incoming/www.funkyfish.nl#59493#cgmspider0 => -1 (No such file or directory) Then after a gluster restart (umount/mount sequence): == On the Gluster mounted volume: archive at cgmarchive0:~/archive/incoming$ ls -l www.funkyfish.nl#59493#cgmspider0 -rw-rw-r-- 1 archive archive 599065 Mar 30 15:16 www.funkyfish.nl#59493#cgmspider0 archive at cgmarchive0:~/archive/incoming$ wc -l www.funkyfish.nl#59493#cgmspider0 10767 www.funkyfish.nl#59493#cgmspider0 The application access pattern for these files is: * a file is copied onto the filesystem with a temporary name * the file is renamed to it's final name * the file is read once, then deleted * the filename is normally not used again or at least not any time soon All file operations went through the gluster fs (no direct local access). The hashing function has been replaced by one that implements a 'consistent hashing' scheme and adapted so that the temporary filename and the final filename always go to the same node. The problem is not isolated to a single case, but it does take a long time (days) to occur. In the long term it can be reproduced so if you need more debugging info I can try to extract it for you. Any ideas? == Volume file volume posix type storage/posix option directory /local.mnt/md0/glfs-data end-volume volume locks type features/posix-locks subvolumes posix end-volume volume fixed-id type features/filter option fixed-uid 2224 option fixed-gid 224 subvolumes locks end-volume volume brick type performance/io-threads subvolumes fixed-id end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow 10.0.0.*,10.1.0.* subvolumes brick end-volume volume node0 type protocol/client option transport-type tcp option remote-host cgmarchive0 option remote-subvolume brick end-volume volume node1 type protocol/client option transport-type tcp option remote-host cgmarchive1 option remote-subvolume brick end-volume volume dht type cluster/dht subvolumes node0 node1 end-volume -- Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl