Can you try your tests by mounting with --attribute-timeout=0 command line parameter? Avati On Fri, Sep 4, 2009 at 12:01 PM, Robert L. Millner<rmillner at webappvm.com> wrote: > Hi, > > We're observing a coherence issue with GlusterFS 2.0.6. ?One client > opens a file, locks, truncates and writes. ?Another client waiting on a > read lock may see a zero length file after the read lock is granted. > > If both nodes read/write in a loop, this tends to happen within a few > hundred tries. ?The same code runs for 10000 loops without a problem if > both programs run on GlusterFS on the same node or local file system > (ext3) on the same node. > > Node1 does the following (strace): > > 2206 ?1252031615.509555 open("testfile", O_RDWR|O_CREAT|O_LARGEFILE, 0644) = 3 > 2206 ?1252031615.514886 fcntl64(3, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0 > 2206 ?1252031615.517742 select(0, NULL, NULL, NULL, {0, 0}) = 0 (Timeout) > 2206 ?1252031615.517788 _llseek(3, 0, [0], SEEK_SET) = 0 > 2206 ?1252031615.517829 ftruncate64(3, 0) = 0 > 2206 ?1252031615.520632 write(3, "01234567890123456789012345678901"..., 900) = 900 > 2206 ?1252031615.599782 close(3) ? ? ? ?= 0 > > 2206 ?1252031615.604731 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3 > 2206 ?1252031615.615158 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0 > 2206 ?1252031615.624680 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=900, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0 > 2206 ?1252031615.624787 _llseek(3, 0, [0], SEEK_SET) = 0 > 2206 ?1252031615.624851 read(3, "01234567890123456789012345678901"..., 4096) = 900 > 2206 ?1252031615.625126 close(3) ? ? ? ?= 0 > > > Node2 does the following (strace): > > 2126 ?1252031615.504350 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3 > 2126 ?1252031615.509004 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfc05dc8) = 0 > 2126 ?1252031615.587697 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=0, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0 > 2126 ?1252031615.588027 _llseek(3, 0, [0], SEEK_SET) = 0 > 2126 ?1252031615.588089 read(3, "", 4096) = 0 > 2126 ?1252031615.588228 close(3) ? ? ? ?= 0 > > > > Both node clocks are NTP disciplined. ?As these are virtual machines, > there's a higher dispersion but I believe you can round to the nearest > 0.1s for time correlation. > > Node2 waits for the write lock to clear before getting its read lock. > Node1 also tries to read the file and agrees with node2 on every stat > field except st_size. ?Node2 tries to read the file and gets no data. > > This is on 32 bit CentOS5 with a 2.6.27 kernel, fuse 2.7.4 on VMware. > Also observed on Amazon EC2 with their 2.6.21 fc8xen kernel. > > I can make the problem unrepeatable in 10000 tries by changing the > select on Node1 to timeout in 0.1 seconds. ?The problem repeats in under > 5000 tries if select is set to timeout in 0.01 seconds. > > This happens whether or not gluster is run with > --disable-direct-io-mode. > > The volume is mirrored between four servers. ?Below is the server > configuration. ?The export directory is on ext3. > > volume posix > ?type storage/posix > ?option directory /var/data/export > end-volume > > volume locks > ?type features/locks > ?option mandatory-locks on > ?subvolumes posix > end-volume > > volume brick > ?type performance/io-threads > ?option thread-count 8 > ?subvolumes locks > end-volume > > volume server > ?type protocol/server > ?option transport-type tcp > ?option auth.addr.brick.allow * > ?subvolumes brick > end-volume > > > And the client configuration: > > volume remote1 > ?type protocol/client > ?option transport-type tcp > ?option remote-host 10.10.10.145 > ?option remote-subvolume brick > end-volume > > volume remote2 > ?type protocol/client > ?option transport-type tcp > ?option remote-host 10.10.10.130 > ?option remote-subvolume brick > end-volume > > volume remote3 > ?type protocol/client > ?option transport-type tcp > ?option remote-host 10.10.10.221 > ?option remote-subvolume brick > end-volume > > volume remote4 > ?type protocol/client > ?option transport-type tcp > ?option remote-host 10.10.10.104 > ?option remote-subvolume brick > end-volume > > volume replicated > ?type cluster/replicate > ?subvolumes remote1 remote2 remote3 remote4 > end-volume > > volume writebehind > ? ?type performance/write-behind > ? ?subvolumes replicated > end-volume > > volume cache > ? ?type performance/io-cache > ? ?subvolumes writebehind > end-volume > > > > The problem persists with those configurations and if any or all of the > following tweaks are made: > > 1. Remove the replicated volume and just use remote1. > 2. Get rid of threads on the server. > 3. Get rid of io-cache and writebehind on the clients. > 4. Use mandatory locking on the test file. > > > Please let me know if there's any more information needed to debug this > further or any guidance on how to avoid it. > > Thank you! > > ? ?Cheers, > ? ?Rob > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >