We have a v3.6.5 two node cluster with a distributed-replicate volume (2x2
bricks, everything formatted with ext4 on CentOS 6.6) which regularly omits
some files from directory listings on the client side, and also regularly
duplicates the listing of some other files.
Summary of the issue and steps we've tried so far:
- There is only one client system connected to this volume.
- That client populates files in this volume by copying them from a local
filesystem into the gluster mount point, via 'cp' within a single process (it
is a single-threaded Python script that invokes call() to run cp via a
subprocess shell), so we believe we have ruled out any possibility of
concurrency or race-condition problems as there is only one source of writes
and the files are copied sequentially.
- The two Gluster servers provide 7 volumes in total, but only one of the
volumes has been observed with this behavior.
- There are no errors or warnings in the Gluster logs, on client or server.
- We have tried clearing all the extended attributes on all the bricks, but
that did not resolve the problem.
- We have deleted everything on the brick filesystems (including .glusterfs/),
but copying the files over again (via the gluster mount point on the client)
results in the same missing & duplicate issue.
- We ran a rebalance/fix-layout on the volume, but that did not resolve the
problem.
- Interestingly, the set of files which are missing from the directory listings
is the same each time we delete everything and try again with an empty
directory; and the set of files which are duplicated in the listing output is
also the same each time.
- When all of the files have been copied over to the gluster volume, running an
'ls' from the client will show most, but not all of the files. Examining the
bricks directly shows that all of the files are present (and properly
distributed and replicated). If an 'rm *' is then done from the client, all
of the files which were visible are deleted, but the files which had not been
visible on the client now are shown by 'ls' and some of them are shown twice in
the output. Examining the bricks directly again shows that all of the files in
the client's ls output are present, but there are no improper duplicates (only
the correctly-replicated copies that should be present). Running another 'rm *'
correctly deletes all of the files both from the client's view, as well as
removing all copies on the underlying bricks.
As requested in IRC, the following is output from getfattr for a file which was
missing in the initial directory listing output on the client, as well as the
getfattr output for its parent directory (I've included the same directory from
all four bricks, though in this distributed+replicate layout, the file was only
(properly) located on the bricks in each gluster hosts' /export/zones1).
As for an example of a file which appeared fine from the beginning, I'll need
to follow up with that in a bit once I can get the client I'm doing this for to
repeat the test, but pausing after the initial copy and before deleting the set
of visible files.
FWIW, these files were copied to an empty volume after a rebalance operation
had been run.
(Host gluster-001)
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim\ ELA\ PT\ Beetles\ \(IAB\)_2015-08-11.tar.gz.gpg
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim ELA PT Beetles (IAB)_2015-08-11.tar.gz.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0x8823094f0ea14f049bbc4f98895f7192
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x0000000100000000000000007fffd0ea
-bash-4.1# getfattr -d -e hex -m . /export/zones2/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones2/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-2=0x000000000000000000000000
trusted.afr.zones-client-3=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x00000001000000007fffd0ebffffffff
(Host gluster-002)
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim\ ELA\ PT\ Beetles\ \(IAB\)_2015-08-11.tar.gz.gpg
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim ELA PT Beetles (IAB)_2015-08-11.tar.gz.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0x8823094f0ea14f049bbc4f98895f7192
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x0000000100000000000000007fffd0ea
-bash-4.1# getfattr -d -e hex -m . /export/zones2/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones2/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-2=0x000000000000000000000000
trusted.afr.zones-client-3=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x00000001000000007fffd0ebffffffff
Volume configuration server-side:
-bash-4.1# mount | grep zones
/dev/mapper/vg.zones1-lv.zones1 on /export/zones1 type ext4 (rw,noatime)
/dev/mapper/vg.zones2-lv.zones2 on /export/zones2 type ext4 (rw,noatime)
-bash-4.1# gluster volume info zones
Volume Name: zones
Type: Distributed-Replicate
Volume ID: 53ff45b1-8dc7-47ef-8a26-3245414e4990
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.1.122:/export/zones1/brick
Brick2: 10.1.1.121:/export/zones1/brick
Brick3: 10.1.1.122:/export/zones2/brick
Brick4: 10.1.1.121:/export/zones2/brick
Options Reconfigured:
client.ssl: off
server.ssl: off
performance.cache-size: 256MB
auth.ssl-allow: *
-bash-4.1# gluster volume status zones
Status of volume: zones
Gluster process Port Online Pid
---------------------------------------------------------------------------
Brick 10.1.1.122:/export/zones1/brick 49165 Y 25189
Brick 10.1.1.121:/export/zones1/brick 49164 Y 697
Brick 10.1.1.122:/export/zones2/brick 49166 Y 25194
Brick 10.1.1.121:/export/zones2/brick 49161 Y 703
NFS Server on localhost 2049 Y 25213
Self-heal Daemon on localhost N/A Y 25222
NFS Server on 10.1.1.121 2049 Y 719
Self-heal Daemon on 10.1.1.121 N/A Y 736
Task Status of Volume zones
---------------------------------------------------------------------------
Task : Rebalance
ID : 75f0b7ae-ed26-417b-a285-9ad81e40073c
Status : completed
Mountpoint on client side:
-bash-4.1# mount | grep zones
10.1.1.122:/zones on /opt/edware/zones type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
bricks, everything formatted with ext4 on CentOS 6.6) which regularly omits
some files from directory listings on the client side, and also regularly
duplicates the listing of some other files.
Summary of the issue and steps we've tried so far:
- There is only one client system connected to this volume.
- That client populates files in this volume by copying them from a local
filesystem into the gluster mount point, via 'cp' within a single process (it
is a single-threaded Python script that invokes call() to run cp via a
subprocess shell), so we believe we have ruled out any possibility of
concurrency or race-condition problems as there is only one source of writes
and the files are copied sequentially.
- The two Gluster servers provide 7 volumes in total, but only one of the
volumes has been observed with this behavior.
- There are no errors or warnings in the Gluster logs, on client or server.
- We have tried clearing all the extended attributes on all the bricks, but
that did not resolve the problem.
- We have deleted everything on the brick filesystems (including .glusterfs/),
but copying the files over again (via the gluster mount point on the client)
results in the same missing & duplicate issue.
- We ran a rebalance/fix-layout on the volume, but that did not resolve the
problem.
- Interestingly, the set of files which are missing from the directory listings
is the same each time we delete everything and try again with an empty
directory; and the set of files which are duplicated in the listing output is
also the same each time.
- When all of the files have been copied over to the gluster volume, running an
'ls' from the client will show most, but not all of the files. Examining the
bricks directly shows that all of the files are present (and properly
distributed and replicated). If an 'rm *' is then done from the client, all
of the files which were visible are deleted, but the files which had not been
visible on the client now are shown by 'ls' and some of them are shown twice in
the output. Examining the bricks directly again shows that all of the files in
the client's ls output are present, but there are no improper duplicates (only
the correctly-replicated copies that should be present). Running another 'rm *'
correctly deletes all of the files both from the client's view, as well as
removing all copies on the underlying bricks.
As requested in IRC, the following is output from getfattr for a file which was
missing in the initial directory listing output on the client, as well as the
getfattr output for its parent directory (I've included the same directory from
all four bricks, though in this distributed+replicate layout, the file was only
(properly) located on the bricks in each gluster hosts' /export/zones1).
As for an example of a file which appeared fine from the beginning, I'll need
to follow up with that in a bit once I can get the client I'm doing this for to
repeat the test, but pausing after the initial copy and before deleting the set
of visible files.
FWIW, these files were copied to an empty volume after a rebalance operation
had been run.
(Host gluster-001)
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim\ ELA\ PT\ Beetles\ \(IAB\)_2015-08-11.tar.gz.gpg
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim ELA PT Beetles (IAB)_2015-08-11.tar.gz.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0x8823094f0ea14f049bbc4f98895f7192
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x0000000100000000000000007fffd0ea
-bash-4.1# getfattr -d -e hex -m . /export/zones2/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones2/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-2=0x000000000000000000000000
trusted.afr.zones-client-3=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x00000001000000007fffd0ebffffffff
(Host gluster-002)
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim\ ELA\ PT\ Beetles\ \(IAB\)_2015-08-11.tar.gz.gpg
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim ELA PT Beetles (IAB)_2015-08-11.tar.gz.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0x8823094f0ea14f049bbc4f98895f7192
-bash-4.1# getfattr -d -e hex -m . /export/zones1/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x0000000100000000000000007fffd0ea
-bash-4.1# getfattr -d -e hex -m . /export/zones2/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones2/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-2=0x000000000000000000000000
trusted.afr.zones-client-3=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x00000001000000007fffd0ebffffffff
Volume configuration server-side:
-bash-4.1# mount | grep zones
/dev/mapper/vg.zones1-lv.zones1 on /export/zones1 type ext4 (rw,noatime)
/dev/mapper/vg.zones2-lv.zones2 on /export/zones2 type ext4 (rw,noatime)
-bash-4.1# gluster volume info zones
Volume Name: zones
Type: Distributed-Replicate
Volume ID: 53ff45b1-8dc7-47ef-8a26-3245414e4990
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.1.122:/export/zones1/brick
Brick2: 10.1.1.121:/export/zones1/brick
Brick3: 10.1.1.122:/export/zones2/brick
Brick4: 10.1.1.121:/export/zones2/brick
Options Reconfigured:
client.ssl: off
server.ssl: off
performance.cache-size: 256MB
auth.ssl-allow: *
-bash-4.1# gluster volume status zones
Status of volume: zones
Gluster process Port Online Pid
---------------------------------------------------------------------------
Brick 10.1.1.122:/export/zones1/brick 49165 Y 25189
Brick 10.1.1.121:/export/zones1/brick 49164 Y 697
Brick 10.1.1.122:/export/zones2/brick 49166 Y 25194
Brick 10.1.1.121:/export/zones2/brick 49161 Y 703
NFS Server on localhost 2049 Y 25213
Self-heal Daemon on localhost N/A Y 25222
NFS Server on 10.1.1.121 2049 Y 719
Self-heal Daemon on 10.1.1.121 N/A Y 736
Task Status of Volume zones
---------------------------------------------------------------------------
Task : Rebalance
ID : 75f0b7ae-ed26-417b-a285-9ad81e40073c
Status : completed
Mountpoint on client side:
-bash-4.1# mount | grep zones
10.1.1.122:/zones on /opt/edware/zones type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users