We have 2 servers, let's name them file01 and file02. They are synced very frequently, so we can assume contents are the same. Then we have lots of clients, everyone of each has two glusterfs mountings, one against every file server. Before you ask, let me say the clients are in a production environment, where I can't afford any downtime. To make the migration from glusterfs v1.3 to glusterfs v2.0 as smooth as possible, I recompiled the packages to run under glusterfs2 name. Servers are running two instances of the glusterfs daemon, and the old one is to be stopped when all the migration is complete. So you'll be seeing some glusterfs2 and build dates that may not be normal, but you'll also see this has nothing to do with this matter. file01 server log: ================================================================================ Version : glusterfs 2.0.1 built on May 26 2009 05:11:51 TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b Starting Time: 2009-07-14 18:07:12 Command line : /usr/sbin/glusterfsd2 -p /var/run/glusterfsd2.pid PID : 6337 System name : Linux Nodename : file01 Kernel Release : 2.6.18-128.1.14.el5 Hardware Identifier: x86_64 Given volfile: +------------------------------------------------------------------------------+ 1: # The data store directory to serve 2: volume filedata-ds 3: type storage/posix 4: option directory /file/data 5: end-volume 6: 7: # Make the data store read-only 8: volume filedata-readonly 9: type testing/features/filter 10: option read-only on 11: subvolumes filedata-ds 12: end-volume 13: 14: # Optimize 15: volume filedata-iothreads 16: type performance/io-threads 17: option thread-count 64 18: # option autoscaling on 19: # option min-threads 16 20: # option max-threads 256 21: subvolumes filedata-readonly 22: end-volume 23: 24: # Add readahead feature 25: volume filedata 26: type performance/read-ahead # cache per file = (page-count x page-size) 27: # option page-size 256kB # 256KB is the default option ? 28: # option page-count 8 # 16 is default option ? 29: subvolumes filedata-iothreads 30: end-volume 31: 32: # Main server section 33: volume server 34: type protocol/server 35: option transport-type tcp 36: option transport.socket.listen-port 6997 37: subvolumes filedata 38: option auth.addr.filedata.allow 192.168.128.* # streamers 39: option verify-volfile-checksum off # don't have clients complain 40: end-volume 41: +------------------------------------------------------------------------------+ [2009-07-14 18:07:12] N [glusterfsd.c:1152:main] glusterfs: Successfully started file02 server log: ================================================================================ Version : glusterfs 2.0.1 built on May 26 2009 05:11:51 TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b Starting Time: 2009-06-28 08:42:13 Command line : /usr/sbin/glusterfsd2 -p /var/run/glusterfsd2.pid PID : 5846 System name : Linux Nodename : file02 Kernel Release : 2.6.18-92.1.10.el5 Hardware Identifier: x86_64 Given volfile: +------------------------------------------------------------------------------+ 1: # The data store directory to serve 2: volume filedata-ds 3: type storage/posix 4: option directory /file/data 5: end-volume 6: 7: # Make the data store read-only 8: volume filedata-readonly 9: type testing/features/filter 10: option read-only on 11: subvolumes filedata-ds 12: end-volume 13: 14: # Optimize 15: volume filedata-iothreads 16: type performance/io-threads 17: option thread-count 64 18: # option autoscaling on 19: # option min-threads 16 20: # option max-threads 256 21: subvolumes filedata-readonly 22: end-volume 23: 24: # Add readahead feature 25: volume filedata 26: type performance/read-ahead # cache per file = (page-count x page-size) 27: # option page-size 256kB # 256KB is the default option ? 28: # option page-count 8 # 16 is default option ? 29: subvolumes filedata-iothreads 30: end-volume 31: 32: # Main server section 33: volume server 34: type protocol/server 35: option transport-type tcp 36: option transport.socket.listen-port 6997 37: subvolumes filedata 38: option auth.addr.filedata.allow 192.168.128.* # streamers 39: option verify-volfile-checksum off # don't have clients complain 40: end-volume 41: +------------------------------------------------------------------------------+ [2009-06-28 08:42:13] N [glusterfsd.c:1152:main] glusterfs: Successfully started Now let's pick a random client, for example streamer013, and see its log: ================================================================================ Version : glusterfs 2.0.1 built on May 26 2009 05:23:52 TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b Starting Time: 2009-07-22 18:34:31 Command line : /usr/sbin/glusterfs2 --log-level=NORMAL --volfile-server=file02.priv --volfile-server-port=6997 /mnt/file02 PID : 14519 System name : Linux Nodename : streamer013 Kernel Release : 2.6.18-92.1.10.el5PAE Hardware Identifier: i686 Given volfile: +------------------------------------------------------------------------------+ 1: # filedata 2: volume filedata 3: type protocol/client 4: option transport-type tcp 5: option remote-host file02.priv 6: option remote-port 6997 # use non default to run in parallel 7: option remote-subvolume filedata 8: end-volume 9: 10: # Add readahead feature 11: volume readahead 12: type performance/read-ahead # cache per file = (page-count x page-size) 13: # option page-size 256kB # 256KB is the default option ? 14: # option page-count 2 # 16 is default option ? 15: subvolumes filedata 16: end-volume 17: 18: # Add threads 19: volume iothreads 20: type performance/io-threads 21: option thread-count 8 22: # option autoscaling on 23: # option min-threads 16 24: # option max-threads 256 25: subvolumes readahead 26: end-volume 27: 28: # Add IO-Cache feature 29: volume iocache 30: type performance/io-cache 31: option cache-size 64MB # default is 32MB (in 1.3) 32: option page-size 256KB # 128KB is default option (in 1.3) 33: subvolumes iothreads 34: end-volume 35: +------------------------------------------------------------------------------+ [2009-07-22 18:34:31] N [glusterfsd.c:1152:main] glusterfs: Successfully started [2009-07-22 18:34:31] N [client-protocol.c:5557:client_setvolume_cbk] filedata: Connected to 192.168.128.232:6997, attached to remote volume 'filedata'. [2009-07-22 18:34:31] N [client-protocol.c:5557:client_setvolume_cbk] filedata: Connected to 192.168.128.232:6997, attached to remote volume 'filedata'. The mountings seem ok: [root@streamer013 /]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) glusterfs#file01.priv on /mnt/file01 type fuse (rw,max_read=131072,allow_other,default_permissions) glusterfs#file02.priv on /mnt/file02 type fuse (rw,max_read=131072,allow_other,default_permissions) They work: [root@streamer013 /]# ls /mnt/file01/ cust [root@streamer013 /]# ls /mnt/file02/ cust And they are seen by both servers: file01: [2009-07-22 18:34:19] N [server-helpers.c:723:server_connection_destroy] server: destroyed connection of streamer013. p4.bt.bcn.flumotion.net-14335-2009/07/22-18:34:13:210609-filedata [2009-07-22 18:34:31] N [server-protocol.c:7796:notify] server: 192.168.128.213:1017 disconnected [2009-07-22 18:34:31] N [server-protocol.c:7796:notify] server: 192.168.128.213:1018 disconnected [2009-07-22 18:34:31] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.128.213:1017 [2009-07-22 18:34:31] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.128.213:1018 file02: [2009-07-22 18:34:20] N [server-helpers.c:723:server_connection_destroy] server: destroyed connection of streamer013. p4.bt.bcn.flumotion.net-14379-2009/07/22-18:34:13:267495-filedata [2009-07-22 18:34:31] N [server-protocol.c:7796:notify] server: 192.168.128.213:1014 disconnected [2009-07-22 18:34:31] N [server-protocol.c:7796:notify] server: 192.168.128.213:1015 disconnected [2009-07-22 18:34:31] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.128.213:1015 [2009-07-22 18:34:31] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.128.213:1014 Now let's see the funny things. First, a content listing of a particular directory, locally from both servers: [root@file01 ~]# ls /file/data/cust/cust1 configs files outgoing reports [root@file02 ~]# ls /file/data/cust/cust1 configs files outgoing reports Now let's try to see the same from the client side: [root@streamer013 /]# ls /mnt/file01/cust/cust1 ls: /mnt/file01/cust/cust1: No such file or directory [root@streamer013 /]# ls /mnt/file02/cust/cust1 configs files outgoing reports Oops :( And the client log says: [2009-07-22 18:41:22] W [fuse-bridge.c:1651:fuse_opendir] glusterfs-fuse: 64: OPENDIR (null) (fuse_loc_fill() failed) While none of the servers logs say anything. So files really exist in the servers, but the same client can see them in one of the filers but not in the other, although both are running exactly the same software. But there's more. It seems it only happens for certain directories (I can't show you the contents due to privacity, but I guess you'll figure it out): [root@streamer013 /]# ls /mnt/file01/cust/|wc -l 95 [root@streamer013 /]# ls /mnt/file02/cust/|wc -l 95 [root@streamer013 /]# for i in `ls /mnt/file01/cust/`; do ls /mnt/file01/cust/$i; done|grep such ls: /mnt/file01/cust/cust1: No such file or directory ls: /mnt/file01/cust/cust2: No such file or directory [root@streamer013 /]# for i in `ls /mnt/file02/cust/`; do ls /mnt/file02/cust/$i; done|grep such [root@streamer013 /]# And of course, our client log error twice: [2009-07-22 18:49:21] W [fuse-bridge.c:1651:fuse_opendir] glusterfs-fuse: 2119: OPENDIR (null) (fuse_loc_fill() failed) [2009-07-22 18:49:21] W [fuse-bridge.c:1651:fuse_opendir] glusterfs-fuse: 2376: OPENDIR (null) (fuse_loc_fill() failed) I hope having been clear enough this time. If you need more data just let me know and I'll see what I can do. And thanks again for your help. Roger On Wed, 2009-07-22 at 09:10 -0700, Anand Avati wrote: > > I have been witnessing some strange behaviour with my GlusterFS system. > > Fact is there are some files which exist and are completely accessible > > in the server, while they can't be accessed from a client, while other > > files do. > > > > To be sure, I copied the same files to another directory and I still was > > unable to see them from the client. To be sure it wasn't any kind of > > file permissions, selinux or whatever issue, I created a copy from a > > working directory, and still wasn't seen from the client. All I get is > > a: > > > > ls: .: No such file or directory > > > > And the client log says: > > > > [2009-07-22 14:04:18] W [fuse-bridge.c:1651:fuse_opendir] > > glusterfs-fuse: 104778: OPENDIR (null) (fuse_loc_fill() failed) > > > > While the server log says nothing. > > > > Funniest thing is the same client has another GlusterFS mount to another > > server, which has exactly the same contents as the first one, and this > > mount does work. > > > > Some data: > > > > [root@streamer001 /]# ls /mnt/file01/cust/cust1/ > > ls: /mnt/file01/cust/cust1/: No such file or directory > > > > [root@streamer001 /]# ls /mnt/file02/cust/cust1/ > > configs files outgoing reports > > > > [root@streamer001 /]# mount > > /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) > > proc on /proc type proc (rw) > > sysfs on /sys type sysfs (rw) > > devpts on /dev/pts type devpts (rw,gid=5,mode=620) > > /dev/sda1 on /boot type ext3 (rw) > > tmpfs on /dev/shm type tmpfs (rw) > > none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) > > sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) > > glusterfs#file01.priv on /mnt/file01 type fuse > > (rw,max_read=131072,allow_other,default_permissions) > > glusterfs#file02.priv on /mnt/file02 type fuse > > (rw,max_read=131072,allow_other,default_permissions) > > > > [root@file01 /]# ls /file/data/cust/cust1 > > configs files outgoing reports > > > > [root@file02 /]# ls /file/data/cust/cust1 > > configs files outgoing reports > > > > Any ideas? > > Can you please post all your client and server logs and volfiles? Are > you quite certain that this is not a result of some misconfiguration? > > Avati