I'm seeing
a problem on my fairly fresh RHEL gluster
install. Smells to me like a parallelism problem
on the server.
If I mount a gluster volume via NFS (using
glusterd's internal NFS server,
nfs-kernel-server) and read a directory from
multiple clients *in parallel*, I get
inconsistent results across servers. Some files
are missing from the directory listing, some may
be present twice!
Exactly which files (or directories!) are
missing/duplicated varies each time. But I can
very consistently reproduce the behaviour.
You can see a screenshot here:
http://imgur.com/JU8AFrt
The replication steps are:
* clusterssh to each NFS client
*
unmount /gv0 (to clear cache)
*
mount /gv0 [1]
*
ls -al /gv0/common/apache-jmeter-2.9/bin
(which is where I first noticed this)
Here's the rub: if, instead of doing the 'ls' in
parallel, I do it in series, it works just fine
(consistent correct results everywhere). But
hitting the gluster server from multiple clients
at the same time causes problems.
I can still stat() and open() the files missing
from the directory listing, they just don't show
up in an enumeration.
Mounting gv0 as a gluster client filesystem
works just fine.
Details of my setup:
2 × gluster servers: 2×E5-2670, 128GB RAM, RHEL
6.4 64-bit, glusterfs-server-3.3.1-1.el6.x86_64
(from EPEL)
4 × NFS clients: 2×E5-2660, 128GB RAM, RHEL 5.7
64-bit, glusterfs-3.3.1-11.el5 (from kkeithley's
repo, only used for testing)
gv0 volume information is below
bricks are 400GB SSDs with ext4[2]
common network is 10GbE, replication between
servers happens over direct 10GbE link.
I will be testing on xfs/btrfs/zfs eventually,
but for now I'm on ext4.
Also attached is my chatlog from asking about
this in #gluster
[1]: fstab line is:
fearless1:/gv0 /gv0 nfs
defaults,sync,tcp,wsize=8192,rsize=8192 0 0
[2]: yes, I've turned off dir_index to avoid
That Bug. I've run the d_off test, results are
here:
http://pastebin.com/zQt5gZnZ
----
gluster> volume info gv0
Volume Name: gv0
Type: Distributed-Replicate
Volume ID:
20117b48-7f88-4f16-9490-a0349afacf71
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1:
fearless1:/export/bricks/500117310007a6d8/glusterdata
Brick2:
fearless2:/export/bricks/500117310007a674/glusterdata
Brick3:
fearless1:/export/bricks/500117310007a714/glusterdata
Brick4:
fearless2:/export/bricks/500117310007a684/glusterdata
Brick5:
fearless1:/export/bricks/500117310007a7dc/glusterdata
Brick6:
fearless2:/export/bricks/500117310007a694/glusterdata
Brick7:
fearless1:/export/bricks/500117310007a7e4/glusterdata
Brick8:
fearless2:/export/bricks/500117310007a720/glusterdata
Brick9:
fearless1:/export/bricks/500117310007a7ec/glusterdata
Brick10:
fearless2:/export/bricks/500117310007a74c/glusterdata
Brick11:
fearless1:/export/bricks/500117310007a838/glusterdata
Brick12:
fearless2:/export/bricks/500117310007a814/glusterdata
Brick13:
fearless1:/export/bricks/500117310007a850/glusterdata
Brick14:
fearless2:/export/bricks/500117310007a84c/glusterdata
Brick15:
fearless1:/export/bricks/500117310007a858/glusterdata
Brick16:
fearless2:/export/bricks/500117310007a8f8/glusterdata
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: off
----
--
Michael Brown | `One of the main causes of the fall of
Systems Consultant | the Roman Empire was that, lacking zero,
Net Direct Inc. | they had no way to indicate successful
☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth