Fwd: files not syncing up with glusterfs 3.1.2

shehjart at gluster.com (Shehjar Tikoo) · Tue, 22 Feb 2011 12:44:51 +0530

paul simpson wrote:
> hello all,
> 
> i have been testing gluster as a central file server for a small animation
> studio/post production company.  my initial experiments were using the fuse
> glusterfs protocol - but that ran extremely slowly for home dirs and general
> file sharing.  we have since switched to using NFS over glusterfs.  NFS
> has certainly seemed more responsive re. stat and dir traversal.  however,
> i'm now being plagued with three different types of errors:
> 
> 1/ Stale NFS file handle
> 2/ input/output errors
> 3/ and a new one:
> $ l -l /n/auto/gv1/production/conan/hda/published/OLD/
> ls: cannot access /n/auto/gv1/production/conan/hda/published/OLD/shot:
> Remote I/O error
> total 0
> d????????? ? ? ? ?                ? shot
> 
> ...so it's a bit all over the place.  i've tried rebooting both servers and
> clients.  these issues are very erratic - they come and go.
> 
> some information on my setup: glusterfs 3.1.2
> 
> g1:~ # gluster volume info
> 
> Volume Name: glustervol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: g1:/mnt/glus1
> Brick2: g2:/mnt/glus1
> Brick3: g3:/mnt/glus1
> Brick4: g4:/mnt/glus1
> Brick5: g1:/mnt/glus2
> Brick6: g2:/mnt/glus2
> Brick7: g3:/mnt/glus2
> Brick8: g4:/mnt/glus2
> Options Reconfigured:
> 
> 
> performance.write-behind-window-size: 1mb
> 
> 
> performance.cache-size: 1gb
> 
> 
> performance.stat-prefetch: 1
> 
> 
> network.ping-timeout: 20
> 
> 
> diagnostics.latency-measurement: off
> 
> 
> diagnostics.dump-fd-stats: on
> 
> 
> that is 4 servers - serving ~30 clients - 95% linux, 5% mac.  all NFS.

Mac OS as a nfs client remains untested against Gluster NFS. Do you see 
these errors on Mac or Linux clients?

>  other points:
> - i'm automounting using NFS via autofs (with ldap).  ie:
>   gus:/glustervol1 on /n/auto/gv1 type nfs
> (rw,vers=3,rsize=32768,wsize=32768,intr,sloppy,addr=10.0.0.13)
> gus is pointing to rr dns machines (g1,g2,g3,g4).  that all seems to be
> working.
> 
> - backend files system on g[1-4] is xfs.  ie,
> 
> g1:/var/log/glusterfs # xfs_info /mnt/glus1
> meta-data=/dev/sdb1              isize=256    agcount=7, agsize=268435200
> blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=1627196928, imaxpct=5
>          =                       sunit=256    swidth=2560 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=32768, version=2
>          =                       sectsz=512   sunit=8 blks, lazy-count=0
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> 
> - sometimes root can stat/read the file in question while the user cannot!
>  i can remount the same NFS share to another mount point - and i can then
> see that with the same user.

I think that may be occurring because NFS+LDAP requires a slightly 
different authentication scheme as compared to a NFS only setup. Please try 
the same test without LDAP in the middle.

> 
> - sample output of g1 nfs.log file:
> 
> [2011-02-18 15:27:07.201433] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:       Filename :
> /production/conan/hda/published/shot/backup/.svn/tmp/entries
> [2011-02-18 15:27:07.201445] I [io-stats.c:353:io_stats_dump_fd]
> glustervol1:   BytesWritten : 1414 bytes
> [2011-02-18 15:27:07.201455] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 001024b+ : 1
> [2011-02-18 15:27:07.205999] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.206032] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:       Filename :
> /production/conan/hda/published/shot/backup/.svn/props/tempfile.tmp
> [2011-02-18 15:27:07.210799] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.210824] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:       Filename :
> /production/conan/hda/published/shot/backup/.svn/tmp/log
> [2011-02-18 15:27:07.211904] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.211928] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:       Filename :
> /prod_data/xmas/lgl/pic/mr_all_PBR_HIGHNO_DF/035/1920x1080/mr_all_PBR_HIGHNO_DF.6084.exr
> [2011-02-18 15:27:07.211940] I [io-stats.c:343:io_stats_dump_fd]
> glustervol1:       Lifetime : 8731secs, 610796usecs
> [2011-02-18 15:27:07.211951] I [io-stats.c:353:io_stats_dump_fd]
> glustervol1:   BytesWritten : 2321370 bytes
> [2011-02-18 15:27:07.211962] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 000512b+ : 1
> [2011-02-18 15:27:07.211972] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 002048b+ : 1
> [2011-02-18 15:27:07.211983] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 004096b+ : 4
> [2011-02-18 15:27:07.212009] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 008192b+ : 4
> [2011-02-18 15:27:07.212019] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 016384b+ : 20
> [2011-02-18 15:27:07.212030] I [io-stats.c:365:io_stats_dump_fd]
> glustervol1: Write 032768b+ : 54
> [2011-02-18 15:27:07.228051] I [io-stats.c:333:io_stats_dump_fd]
> glustervol1: --- fd stats ---
> [2011-02-18 15:27:07.228078] I [io-stats.c:338:io_stats_dump_fd]
> glustervol1:       Filename :
> /production/conan/hda/published/shot/backup/.svn/tmp/entries
> 
> ...so, the files not working don't have lifetime, read/written lines after
> their log entry.
> 

I'll need the log for the NFS server in TRACE log level when you run a 
command that results in any of the errors above. i.e. stale file handle, 
remote IO error and input/output error.

Thanks

> all very perplexing - and scary.  one thing that reliably fails is using svn
> working dirs on the gluster filesystem.  nfs locks keep being dropped.  this
> is temporarily fixed when i view the file as root (on a client) - but then
> re-appears very quickly.  i assume that gluster is upto something as simple
> as having svn working dirs?
> 
> i'm hoping i've done something stupid which is easily fixed.  we seem so
> close - but right now, i'm at a loss and loosing confidence.  i would
> greatly appreciate any help/pointers out there.
> 
> regards,
> 
> paul
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users