Hi, We're observing a coherence issue with GlusterFS 2.0.6. One client opens a file, locks, truncates and writes. Another client waiting on a read lock may see a zero length file after the read lock is granted. If both nodes read/write in a loop, this tends to happen within a few hundred tries. The same code runs for 10000 loops without a problem if both programs run on GlusterFS on the same node or local file system (ext3) on the same node. Node1 does the following (strace): 2206 1252031615.509555 open("testfile", O_RDWR|O_CREAT|O_LARGEFILE, 0644) = 3 2206 1252031615.514886 fcntl64(3, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0 2206 1252031615.517742 select(0, NULL, NULL, NULL, {0, 0}) = 0 (Timeout) 2206 1252031615.517788 _llseek(3, 0, [0], SEEK_SET) = 0 2206 1252031615.517829 ftruncate64(3, 0) = 0 2206 1252031615.520632 write(3, "01234567890123456789012345678901"..., 900) = 900 2206 1252031615.599782 close(3) = 0 2206 1252031615.604731 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3 2206 1252031615.615158 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0 2206 1252031615.624680 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=900, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0 2206 1252031615.624787 _llseek(3, 0, [0], SEEK_SET) = 0 2206 1252031615.624851 read(3, "01234567890123456789012345678901"..., 4096) = 900 2206 1252031615.625126 close(3) = 0 Node2 does the following (strace): 2126 1252031615.504350 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3 2126 1252031615.509004 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfc05dc8) = 0 2126 1252031615.587697 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=0, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0 2126 1252031615.588027 _llseek(3, 0, [0], SEEK_SET) = 0 2126 1252031615.588089 read(3, "", 4096) = 0 2126 1252031615.588228 close(3) = 0 Both node clocks are NTP disciplined. As these are virtual machines, there's a higher dispersion but I believe you can round to the nearest 0.1s for time correlation. Node2 waits for the write lock to clear before getting its read lock. Node1 also tries to read the file and agrees with node2 on every stat field except st_size. Node2 tries to read the file and gets no data. This is on 32 bit CentOS5 with a 2.6.27 kernel, fuse 2.7.4 on VMware. Also observed on Amazon EC2 with their 2.6.21 fc8xen kernel. I can make the problem unrepeatable in 10000 tries by changing the select on Node1 to timeout in 0.1 seconds. The problem repeats in under 5000 tries if select is set to timeout in 0.01 seconds. This happens whether or not gluster is run with --disable-direct-io-mode. The volume is mirrored between four servers. Below is the server configuration. The export directory is on ext3. volume posix type storage/posix option directory /var/data/export end-volume volume locks type features/locks option mandatory-locks on subvolumes posix end-volume volume brick type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume And the client configuration: volume remote1 type protocol/client option transport-type tcp option remote-host 10.10.10.145 option remote-subvolume brick end-volume volume remote2 type protocol/client option transport-type tcp option remote-host 10.10.10.130 option remote-subvolume brick end-volume volume remote3 type protocol/client option transport-type tcp option remote-host 10.10.10.221 option remote-subvolume brick end-volume volume remote4 type protocol/client option transport-type tcp option remote-host 10.10.10.104 option remote-subvolume brick end-volume volume replicated type cluster/replicate subvolumes remote1 remote2 remote3 remote4 end-volume volume writebehind type performance/write-behind subvolumes replicated end-volume volume cache type performance/io-cache subvolumes writebehind end-volume The problem persists with those configurations and if any or all of the following tweaks are made: 1. Remove the replicated volume and just use remote1. 2. Get rid of threads on the server. 3. Get rid of io-cache and writebehind on the clients. 4. Use mandatory locking on the test file. Please let me know if there's any more information needed to debug this further or any guidance on how to avoid it. Thank you! Cheers, Rob