hello all, i have been testing gluster as a central file server for a small animation studio/post production company. my initial experiments were using the fuse glusterfs protocol - but that ran extremely slowly for home dirs and general file sharing. we have since switched to using NFS over glusterfs. NFS has certainly seemed more responsive re. stat and dir traversal. however, i'm now being plagued with three different types of errors: 1/ Stale NFS file handle 2/ input/output errors 3/ and a new one: $ l -l /n/auto/gv1/production/conan/hda/published/OLD/ ls: cannot access /n/auto/gv1/production/conan/hda/published/OLD/shot: Remote I/O error total 0 d????????? ? ? ? ? ? shot ...so it's a bit all over the place. i've tried rebooting both servers and clients. these issues are very erratic - they come and go. some information on my setup: glusterfs 3.1.2 g1:~ # gluster volume info Volume Name: glustervol1 Type: Distributed-Replicate Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: g1:/mnt/glus1 Brick2: g2:/mnt/glus1 Brick3: g3:/mnt/glus1 Brick4: g4:/mnt/glus1 Brick5: g1:/mnt/glus2 Brick6: g2:/mnt/glus2 Brick7: g3:/mnt/glus2 Brick8: g4:/mnt/glus2 Options Reconfigured: performance.write-behind-window-size: 1mb performance.cache-size: 1gb performance.stat-prefetch: 1 network.ping-timeout: 20 diagnostics.latency-measurement: off diagnostics.dump-fd-stats: on that is 4 servers - serving ~30 clients - 95% linux, 5% mac. all NFS. other points: - i'm automounting using NFS via autofs (with ldap). ie: gus:/glustervol1 on /n/auto/gv1 type nfs (rw,vers=3,rsize=32768,wsize=32768,intr,sloppy,addr=10.0.0.13) gus is pointing to rr dns machines (g1,g2,g3,g4). that all seems to be working. - backend files system on g[1-4] is xfs. ie, g1:/var/log/glusterfs # xfs_info /mnt/glus1 meta-data=/dev/sdb1 isize=256 agcount=7, agsize=268435200 blks = sectsz=512 attr=2 data = bsize=4096 blocks=1627196928, imaxpct=5 = sunit=256 swidth=2560 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=8 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 - sometimes root can stat/read the file in question while the user cannot! i can remount the same NFS share to another mount point - and i can then see that with the same user. - sample output of g1 nfs.log file: [2011-02-18 15:27:07.201433] I [io-stats.c:338:io_stats_dump_fd] glustervol1: Filename : /production/conan/hda/published/shot/backup/.svn/tmp/entries [2011-02-18 15:27:07.201445] I [io-stats.c:353:io_stats_dump_fd] glustervol1: BytesWritten : 1414 bytes [2011-02-18 15:27:07.201455] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 001024b+ : 1 [2011-02-18 15:27:07.205999] I [io-stats.c:333:io_stats_dump_fd] glustervol1: --- fd stats --- [2011-02-18 15:27:07.206032] I [io-stats.c:338:io_stats_dump_fd] glustervol1: Filename : /production/conan/hda/published/shot/backup/.svn/props/tempfile.tmp [2011-02-18 15:27:07.210799] I [io-stats.c:333:io_stats_dump_fd] glustervol1: --- fd stats --- [2011-02-18 15:27:07.210824] I [io-stats.c:338:io_stats_dump_fd] glustervol1: Filename : /production/conan/hda/published/shot/backup/.svn/tmp/log [2011-02-18 15:27:07.211904] I [io-stats.c:333:io_stats_dump_fd] glustervol1: --- fd stats --- [2011-02-18 15:27:07.211928] I [io-stats.c:338:io_stats_dump_fd] glustervol1: Filename : /prod_data/xmas/lgl/pic/mr_all_PBR_HIGHNO_DF/035/1920x1080/mr_all_PBR_HIGHNO_DF.6084.exr [2011-02-18 15:27:07.211940] I [io-stats.c:343:io_stats_dump_fd] glustervol1: Lifetime : 8731secs, 610796usecs [2011-02-18 15:27:07.211951] I [io-stats.c:353:io_stats_dump_fd] glustervol1: BytesWritten : 2321370 bytes [2011-02-18 15:27:07.211962] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 000512b+ : 1 [2011-02-18 15:27:07.211972] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 002048b+ : 1 [2011-02-18 15:27:07.211983] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 004096b+ : 4 [2011-02-18 15:27:07.212009] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 008192b+ : 4 [2011-02-18 15:27:07.212019] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 016384b+ : 20 [2011-02-18 15:27:07.212030] I [io-stats.c:365:io_stats_dump_fd] glustervol1: Write 032768b+ : 54 [2011-02-18 15:27:07.228051] I [io-stats.c:333:io_stats_dump_fd] glustervol1: --- fd stats --- [2011-02-18 15:27:07.228078] I [io-stats.c:338:io_stats_dump_fd] glustervol1: Filename : /production/conan/hda/published/shot/backup/.svn/tmp/entries ...so, the files not working don't have lifetime, read/written lines after their log entry. all very perplexing - and scary. one thing that reliably fails is using svn working dirs on the gluster filesystem. nfs locks keep being dropped. this is temporarily fixed when i view the file as root (on a client) - but then re-appears very quickly. i assume that gluster is upto something as simple as having svn working dirs? i'm hoping i've done something stupid which is easily fixed. we seem so close - but right now, i'm at a loss and loosing confidence. i would greatly appreciate any help/pointers out there. regards, paul