client coherence problem with locks and truncate

avati at gluster.com (Anand Avati) · Sat, 5 Sep 2009 03:45:16 -0700

Can you try your tests by mounting with --attribute-timeout=0 command
line parameter?

Avati

On Fri, Sep 4, 2009 at 12:01 PM, Robert L. Millner<rmillner at webappvm.com> wrote:
> Hi,
>
> We're observing a coherence issue with GlusterFS 2.0.6. ?One client
> opens a file, locks, truncates and writes. ?Another client waiting on a
> read lock may see a zero length file after the read lock is granted.
>
> If both nodes read/write in a loop, this tends to happen within a few
> hundred tries. ?The same code runs for 10000 loops without a problem if
> both programs run on GlusterFS on the same node or local file system
> (ext3) on the same node.
>
> Node1 does the following (strace):
>
> 2206 ?1252031615.509555 open("testfile", O_RDWR|O_CREAT|O_LARGEFILE, 0644) = 3
> 2206 ?1252031615.514886 fcntl64(3, F_SETLKW64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0
> 2206 ?1252031615.517742 select(0, NULL, NULL, NULL, {0, 0}) = 0 (Timeout)
> 2206 ?1252031615.517788 _llseek(3, 0, [0], SEEK_SET) = 0
> 2206 ?1252031615.517829 ftruncate64(3, 0) = 0
> 2206 ?1252031615.520632 write(3, "01234567890123456789012345678901"..., 900) = 900
> 2206 ?1252031615.599782 close(3) ? ? ? ?= 0
>
> 2206 ?1252031615.604731 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3
> 2206 ?1252031615.615158 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfcaee78) = 0
> 2206 ?1252031615.624680 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=900, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0
> 2206 ?1252031615.624787 _llseek(3, 0, [0], SEEK_SET) = 0
> 2206 ?1252031615.624851 read(3, "01234567890123456789012345678901"..., 4096) = 900
> 2206 ?1252031615.625126 close(3) ? ? ? ?= 0
>
>
> Node2 does the following (strace):
>
> 2126 ?1252031615.504350 open("testfile", O_RDONLY|O_CREAT|O_LARGEFILE, 0644) = 3
> 2126 ?1252031615.509004 fcntl64(3, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbfc05dc8) = 0
> 2126 ?1252031615.587697 fstat64(3, {st_dev=makedev(0, 13), st_ino=182932, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=0, st_atime=2009/09/03-19:33:35, st_mtime=2009/09/03-19:33:35, st_ctime=2009/09/03-19:33:35}) = 0
> 2126 ?1252031615.588027 _llseek(3, 0, [0], SEEK_SET) = 0
> 2126 ?1252031615.588089 read(3, "", 4096) = 0
> 2126 ?1252031615.588228 close(3) ? ? ? ?= 0
>
>
>
> Both node clocks are NTP disciplined. ?As these are virtual machines,
> there's a higher dispersion but I believe you can round to the nearest
> 0.1s for time correlation.
>
> Node2 waits for the write lock to clear before getting its read lock.
> Node1 also tries to read the file and agrees with node2 on every stat
> field except st_size. ?Node2 tries to read the file and gets no data.
>
> This is on 32 bit CentOS5 with a 2.6.27 kernel, fuse 2.7.4 on VMware.
> Also observed on Amazon EC2 with their 2.6.21 fc8xen kernel.
>
> I can make the problem unrepeatable in 10000 tries by changing the
> select on Node1 to timeout in 0.1 seconds. ?The problem repeats in under
> 5000 tries if select is set to timeout in 0.01 seconds.
>
> This happens whether or not gluster is run with
> --disable-direct-io-mode.
>
> The volume is mirrored between four servers. ?Below is the server
> configuration. ?The export directory is on ext3.
>
> volume posix
> ?type storage/posix
> ?option directory /var/data/export
> end-volume
>
> volume locks
> ?type features/locks
> ?option mandatory-locks on
> ?subvolumes posix
> end-volume
>
> volume brick
> ?type performance/io-threads
> ?option thread-count 8
> ?subvolumes locks
> end-volume
>
> volume server
> ?type protocol/server
> ?option transport-type tcp
> ?option auth.addr.brick.allow *
> ?subvolumes brick
> end-volume
>
>
> And the client configuration:
>
> volume remote1
> ?type protocol/client
> ?option transport-type tcp
> ?option remote-host 10.10.10.145
> ?option remote-subvolume brick
> end-volume
>
> volume remote2
> ?type protocol/client
> ?option transport-type tcp
> ?option remote-host 10.10.10.130
> ?option remote-subvolume brick
> end-volume
>
> volume remote3
> ?type protocol/client
> ?option transport-type tcp
> ?option remote-host 10.10.10.221
> ?option remote-subvolume brick
> end-volume
>
> volume remote4
> ?type protocol/client
> ?option transport-type tcp
> ?option remote-host 10.10.10.104
> ?option remote-subvolume brick
> end-volume
>
> volume replicated
> ?type cluster/replicate
> ?subvolumes remote1 remote2 remote3 remote4
> end-volume
>
> volume writebehind
> ? ?type performance/write-behind
> ? ?subvolumes replicated
> end-volume
>
> volume cache
> ? ?type performance/io-cache
> ? ?subvolumes writebehind
> end-volume
>
>
>
> The problem persists with those configurations and if any or all of the
> following tweaks are made:
>
> 1. Remove the replicated volume and just use remote1.
> 2. Get rid of threads on the server.
> 3. Get rid of io-cache and writebehind on the clients.
> 4. Use mandatory locking on the test file.
>
>
> Please let me know if there's any more information needed to debug this
> further or any guidance on how to avoid it.
>
> Thank you!
>
> ? ?Cheers,
> ? ?Rob
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>