I apologize in advance for this - it's not complete and I can't go back easily and test for the specific issues that I've run across, as we're moving fast and trying to get things stable to go into production. I'll summarize issues that I saw and dealt with as best I can, in the hope that it'll be useful. I retained two sets of configuration files, one for the single server setup and one for the dual-server setup, see the end of the mail. The performance issue (see previous thread) is present in all versions - I'm now pretty convinced that it's not a glusterfs issue directly, but rather some interaction between Apache/PHP and glusterfs/fuse - so set that aside for now. Patch 628, fuse 2.7.2glfs8, one server with two bricks, two clients running unify (glusterfs-server.vol, glusterfs-client-old.vol) 1) Namespace showing files with sizes on the local disk. This occurred on files that we wrote to the glusterfs mount on the client. The namespace did *not* show filesystem space used when I checked it with "du -sh", but when I attempted to copy the files out from under the glusterfs namespace on the server, the whole file was copied (more on this in a second). In other words, these files in the namespace were acting similar to hard links. >From what I understand, this problem has shown up before, but wasn't reproduceable, so I missed this opportunity for troubleshooting. Patch 628, fuse 2.7.2glfs8, two servers with two bricks, two clients running afr'd namespace and afr/unify on the bricks (glusterfs-server.vol, glusterfs-client.vol) 2) When I first set this up, I attempted to add the second server to the first, with the plan being to add afr and then run the find command in the documentation to trigger the afr self-heal. This turned out to be impossible, because when I added the namespace mounts (one blank, one full) to the namespace afr volume, namespace essentially stopped working. I could mount the share, and access any file directly by name, but I couldn't list the directories nor could I run the find command to trigger the afr. Once a file or directory was accessed, it would show up in the namespace. I ended up pulling directory listings from underneath the working namespace, and running "head -c 1" on all those files one at a time to get the namespace to come back. This partially worked, in that I was able to get the mount into a useable state, but it was not fully functional. 3) I know it wasn't fully functional because I then tried to wipe out the mount via the client using "rm -rf *". This failed in a number of interesting ways. One was a client deadlock - everything still running, but any attempt to access the mount resulted in a hung process (I was able to recover from this by terminating any process with an open file on the mount, then unmounting and remounting). There were also files that were apparently in the namespace, but not present (or accessible?) via the mount - I didn't get a chance to get a good look at this. Patch 640, fuse 2.7.2glfs8, two servers, two clients running afr/unify (glusterfs-test.vol, glusterfs-test-client.vol) 4) I upgraded to 640 to avoid the "always writes files with the group as root" issue, which I checked for after seeing it on the mailing list and found was occurring on our mount. 5) I then moved the brick directories and set up two different glusterfsd configurations, so that I started with a clean slate and had them split into production and test mounts (on 6996 and 6997, respectively), so that I could mount and unmount independently. I attempted to rsync about 200G of data from an NFS mount to the glusterfs mount. This went ok, except for the client deadlocking twice during the rsync. This deadlock had the same symptoms as in #3, no error messages, no indication of a problem, just hung processes and no access to the mount. I fixed it in the same way, by closing all processes with open files, unmounting glusterfs, and remounting. I updated to 642 because of the 0-size file replication issue, so I'm now running 642. ******************** glusterfs-server.vol (corresponds with issues 1, 2, and 3 on the server side) volume qbert-ns type storage/posix option directory /namespace end-volume volume qbert1 type storage/posix option directory /mnt/qbert1 end-volume volume qbert2 type storage/posix option directory /mnt/qbert2 end-volume volume qbert1-locks type features/posix-locks #option mandatory on subvolumes qbert1 end-volume volume qbert2-locks type features/posix-locks #option mandatory on subvolumes qbert2 end-volume volume qbert1-export type performance/io-threads option thread-count 4 option cache-size 64MB subvolumes qbert1-locks end-volume volume qbert2-export type performance/io-threads option thread-count 4 option cache-size 64MB subvolumes qbert2-locks end-volume volume server type protocol/server option transport-type tcp/server option client-volume-filename /etc/glusterfs/glusterfs-client.vol subvolumes qbert-ns qbert1-export qbert2-export option auth.ip.qbert-ns.allow * option auth.ip.qbert1-export.allow * option auth.ip.qbert2-export.allow * end-volume ******************** glusterfs-client-old.vol (corresponds with issue 1) volume qbert-ns-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option transport-timeout 30 option remote-subvolume qbert-ns end-volume volume qbert1-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option transport-timeout 30 option remote-subvolume qbert1-export end-volume volume qbert2-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option transport-timeout 30 option remote-subvolume qbert2-export end-volume volume unify type cluster/unify subvolumes qbert1-client qbert2-client option scheduler alu option namespace qbert1-ns-client option alu.limits.min-free-disk 2GB option alu.order disk-usage option alu.disk-usage.entry-threshold 2GB option alu.disk-usage.exit-threshold 500MB option alu.stat-refresh.interval 10sec # option self-heal off end-volume volume unify-ra type performance/read-ahead option page-size 1MB option page-count 16 subvolumes unify end-volume volume unify-iocache type performance/io-cache option cache-size 512MB option page-size 1MB subvolumes unify-ra end-volume volume unify-writeback type performance/write-behind option aggregate-size 1MB option flush-behind off subvolumes unify-iocache end-volume ******************** glusterfs-client.vol (corresponds to issue 2 and 3) volume qbert-ns-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option transport-timeout 30 option remote-subvolume qbert-ns end-volume volume qbert1-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option transport-timeout 30 option remote-subvolume qbert1-export end-volume volume qbert2-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option transport-timeout 30 option remote-subvolume qbert2-export end-volume volume pacman-ns-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.41 option transport-timeout 30 option remote-subvolume pacman-ns end-volume volume pacman1-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.41 option transport-timeout 30 option remote-subvolume pacman1-export end-volume volume pacman2-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.41 option transport-timeout 30 option remote-subvolume pacman2-export end-volume volume ns-afr type cluster/afr subvolumes qbert-ns-client pacman-ns-client end-volume volume 1-afr type cluster/afr subvolumes qbert1-client pacman1-client end-volume volume 2-afr type cluster/afr subvolumes qbert2-client pacman2-client end-volume volume unify type cluster/unify subvolumes 1-afr 2-afr option scheduler alu option namespace ns-afr option alu.limits.min-free-disk 2GB option alu.order disk-usage option alu.disk-usage.entry-threshold 2GB option alu.disk-usage.exit-threshold 500MB option alu.stat-refresh.interval 10sec # option self-heal off end-volume volume unify-ra type performance/read-ahead option page-size 1MB option page-count 16 subvolumes unify end-volume volume unify-iocache type performance/io-cache option cache-size 512MB option page-size 1MB subvolumes unify-ra end-volume volume unify-writeback type performance/write-behind option aggregate-size 1MB option flush-behind off subvolumes unify-iocache end-volume ******************** glusterfs-test.vol (servers (identical except for name) issues 4 and 5, currently in use) # namespace volumes volume qbert-test-ns type storage/posix option directory /mnt/qbert1/test-ns end-volume # base volumes volume qbert1-test type storage/posix option directory /mnt/qbert1/test end-volume volume qbert2-test type storage/posix option directory /mnt/qbert2/test end-volume volume qbert1-test-locks type features/posix-locks #option mandatory on subvolumes qbert1-test end-volume volume qbert2-test-locks type features/posix-locks #option mandatory on subvolumes qbert2-test end-volume # io-threads (should be just before server, always last, if you change this # make sure the last translator is named -export to avoid client config volume qbert1-test-export type performance/io-threads option thread-count 4 option cache-size 64MB subvolumes qbert1-test-locks end-volume volume qbert2-test-export type performance/io-threads option thread-count 4 option cache-size 64MB subvolumes qbert2-test-locks end-volume volume server type protocol/server option transport-type tcp/server option listen-port 6997 option client-volume-filename /etc/glusterfs/glusterfs-test-client.vol subvolumes qbert-test-ns qbert1-test-export qbert2-test-export option auth.ip.qbert-test-ns.allow * option auth.ip.qbert1-test-export.allow * option auth.ip.qbert2-test-export.allow * end-volume ******************** glusterfs-test-client.vol (issues 4 and 5, current) # client connections volume qbert-test-ns-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option remote-port 6997 option transport-timeout 30 option remote-subvolume qbert-test-ns end-volume volume qbert1-test-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option remote-port 6997 option transport-timeout 30 option remote-subvolume qbert1-test-export end-volume volume qbert2-test-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.40 option remote-port 6997 option transport-timeout 30 option remote-subvolume qbert2-test-export end-volume volume pacman-test-ns-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.41 option remote-port 6997 option transport-timeout 30 option remote-subvolume pacman-test-ns end-volume volume pacman1-test-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.41 option remote-port 6997 option transport-timeout 30 option remote-subvolume pacman1-test-export end-volume volume pacman2-test-client type protocol/client option transport-type tcp/client option remote-host 10.0.0.41 option remote-port 6997 option transport-timeout 30 option remote-subvolume pacman2-test-export end-volume # afr volumes volume test-ns-afr type cluster/afr subvolumes qbert-test-ns-client pacman-test-ns-client end-volume volume test-1-afr type cluster/afr subvolumes qbert1-test-client pacman1-test-client end-volume volume test-2-afr type cluster/afr subvolumes qbert2-test-client pacman2-test-client end-volume # unify volume unify type cluster/unify subvolumes test-1-afr test-2-afr option scheduler alu option namespace test-ns-afr option alu.limits.min-free-disk 2GB option alu.order disk-usage option alu.disk-usage.entry-threshold 2GB option alu.disk-usage.exit-threshold 500MB option alu.stat-refresh.interval 10sec # option self-heal off end-volume # performance translators volume unify-ra type performance/read-ahead option page-size 1MB option page-count 16 subvolumes unify end-volume volume unify-iocache type performance/io-cache option cache-size 512MB option page-size 1MB subvolumes unify-ra end-volume volume unify-writeback type performance/write-behind option aggregate-size 1MB option flush-behind off subvolumes unify-iocache end-volume