On Wed, 8 Aug 2007, Krishna Srinivas wrote:
Hi Brent, Thanks. So if you use storage/posix under afr, you don't see problem in nfs reexport.
Correct, that worked fine. Once I introduced protocol/client and protocol/server, though, rsync -aH /usr/ /mount/nfs0/ gives I/O errors and an inconsistent copy.
We are not able to reproduce this behaviour here.
Did you try with the spec files I sent you (they only need two directories available on a single machine), with an rsync of your /usr partition to the NFS reexport (this can also be done via localhost, no additional machines needed)? You are using the kernel NFS server, I assume, not one of the user-mode NFS servers?
Can you give us access to your machines? is it possible?
Yes, if the above doesn't do the trick, we can coordinate some way to get you access. Do you have an SSH public key I could add as an authorized key?
Thanks, Brent
On 8/8/07, Brent A Nelson <brent@xxxxxxxxxxxx> wrote:Today, I tried switching to the Gluster-modified fuse-2.7.0, but I still encountered the same misbehavior with NFS reexport. Heads-up: like someone else on the mailing list, I found that GlusterFS performance is MUCH slower with 2.7.0 than with my old 2.6.3, at least for simple "du" tests... Failing that, I thought I'd try to figure out the simplest specs to exhibit the issue; see attached. I first tried glusterfs (no glusterfsd); it worked for a simple afr as well as unification of two afrs with no NFS reexport trouble. As soon as I introduced a glusterfsd exporting to the glusterfs via protocol/client and protocol/server (via localhost), however, the rsync problems appeared. I didn't see the issues with du in this simple setup, though (perhaps that problem will disappear when this problem is fixed, perhaps not). Thanks, Brent On Tue, 7 Aug 2007, Krishna Srinivas wrote:Hi Brent, Those messages in log are harmless, I have removed them from the source. Can you mail the spec files? I will see again if it can be repro'd Thanks Krishna On 8/7/07, Brent A Nelson <brent@xxxxxxxxxxxx> wrote:I added debugging to all the AFR subvolumes. On the du test, all it produced were lines like this over and over: 2007-08-06 17:23:41 C [dict.c:1094:data_to_ptr] libglusterfs/dict: @data=(nil) For the rsync (in addition to the @data=(nil) messages): rsync -a /tmp/blah/usr0/ /tmp/blah/nfs0/ rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"): Input/output error (5) rsync: writefd_unbuffered failed to write 2672 bytes [sender]: Broken pipe (32) rsync: close failed on "/tmp/blah/nfs0/games/.banner.vl3iqI": Operation not permitted (1) rsync: connection unexpectedly closed (98 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(454) [sender=2.6.9] The debug output is: 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3: (path=/nfs0/games/.banner.vl3iqI child=share3-0) op_ret=-1 op_errno=61 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3: (path=/nfs0/games/.banner.vl3iqI child=share3-1) op_ret=-1 op_errno=61 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0: (path=/nfs0/games/.banner.vl3iqI child=ns0-0) op_ret=-1 op_errno=61 2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0: (path=/nfs0/games/.banner.vl3iqI child=ns0-1) op_ret=-1 op_errno=61 This is new behavior; rsync didn't used to actually die, it just made incomplete copies. On Tue, 7 Aug 2007, Krishna Srinivas wrote:Hi Brent, Can you put "option debug on" in afr subvolume and try the du/rsync operations and mail the log? We are not able to reproduce the problem here, nfs is working fine over afr. Thanks Krishna On 8/4/07, Krishna Srinivas <krishna@xxxxxxxxxxxxx> wrote:rsync was failing for me without no_root_squash, so thought that might have been the culprit. If i put no_root_squash, nfs over afr works fine for me. Yes you are right, for some reason readdir() is not functioning properly I think because of which paths are getting corrupted. will get back to you. Thanks Krishna On 8/4/07, Brent A Nelson <brent@xxxxxxxxxxxx> wrote:All of my tests were done with no_root_squash already, and all tests were done as root. Without AFR, gluster and NFS reexports work fine with du and rsync. With AFR, gluster by itself is fine, but du and rsync from an NFS client do not work properly. rsync gives lots of I/O errors and occasional "file has vanished" messages for paths where the last element is junk. du gives incorrect sizes (smaller than it should) and occassionally gives "no such file or directory", also for paths where the last element is junk. See output below for examples from both of this junk. Perhaps if you could figure out how those paths are getting corrupted, the whole problem will be resolved... Thanks, Brent On Sat, 4 Aug 2007, Krishna Srinivas wrote:Hi Brent, Can you add no_root_squash to exports file and reexport and mount using nfs and try to rsync as root and see if it works? like: "/mnt/gluster *(rw,no_root_squash,sync,fsid=3)" Thanks Krishna On 8/4/07, Brent A Nelson <brent@xxxxxxxxxxxx> wrote:Woops, scratch that. I accidentally tested the 2nd GlusterFS directory, not the final NFS mount. Even with the GlusterFS reexport of the original GlusterFS, the issue is still present. Thanks and sorry for the confusion, Brent On Fri, 3 Aug 2007, Brent A Nelson wrote:I do have a workaround which can hide this bug, thanks to the wonderful flexibility of GlusterFS and the fact that it in itself is POSIX. If I mount the GlusterFS as usual, but then use another glusterfs/glusterfsd pair to export and mount it and NFS reexport THAT, the problem does not appear. Presumably, server-side AFR instead of client-side would also bypass the issue (not tested)... Thanks, Brent On Fri, 3 Aug 2007, Brent A Nelson wrote:I turned off self-heal on all the AFR volumes, remounted and reexported (I didn't delete the data; let me know if that is needed). du -sk /tmp/blah/* (via NFS) du: cannot access `/tmp/blah/usr0/include/c++/4.1.2/\a': No such file or directory 171832 /tmp/blah/usr0 109476 /tmp/blah/usr0-copy du: cannot access `/tmp/blah/usr1/include/sys/\337O\004': No such file or directory du: cannot access `/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/\v': No such file or directory du: cannot access `/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/&\324\004': No such file or directory du: cannot access `/tmp/blah/usr1/src/linux-headers-2.6.20-16/drivers/\006': No such file or directory 117472 /tmp/blah/usr1 58392 /tmp/blah/usr1-copy It appears that self-heal isn't the culprit. Thanks, Brent On Fri, 3 Aug 2007, Krishna Srinivas wrote:Hi Brent, Can you turn self-heal off (option self-heal off) and see how it behaves? Thanks Krishna On 8/3/07, Brent A Nelson <brent@xxxxxxxxxxxx> wrote:A hopefully relevant strace snippet: open("share/perl/5.8.8/unicore/lib/jt", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7c63000 getdents64(3, /* 6 entries */, 1048576) = 144 lstat64("share/perl/5.8.8/unicore/lib/jt/C.pl", {st_mode=S_IFREG|0644, st_size=220, ...}) = 0 lstat64("share/perl/5.8.8/unicore/lib/jt/U.pl", {st_mode=S_IFREG|0644, st_size=251, ...}) = 0 lstat64("share/perl/5.8.8/unicore/lib/jt/D.pl", {st_mode=S_IFREG|0644, st_size=438, ...}) = 0 lstat64("share/perl/5.8.8/unicore/lib/jt/R.pl", {st_mode=S_IFREG|0644, st_size=426, ...}) = 0 getdents64(3, /* 0 entries */, 1048576) = 0 munmap(0xb7c63000, 1052672) = 0 close(3) = 0 open("share/perl/5.8.8/unicore/lib/gc_sc", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7c63000 getdents64(3, 0xb7c63024, 1048576) = -1 EIO (Input/output error) write(2, "rsync: readdir(\"/tmp/blah/usr0/s"..., 91rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"): Input/output error (5)) = 91 write(2, "\n", 1 ) = 1 munmap(0xb7c63000, 1052672) = 0 close(3) = 0 Thanks, Brent On Thu, 2 Aug 2007, Brent A Nelson wrote:NFS reexport of a unified GlusterFS seems to be working fine as of TLA 409. I can make identical copies of a /usr area local-to-glusterfs and glusterfs-to-glusterfs, hardlinks and all. Awesome! However, this is not true when AFR is added to the mix (rsync glusterfs-to-glusterfs via NFS reexport): rsync: readdir("/tmp/blah/usr0/lib/perl/5.8.8/auto/POSIX"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/i18n/locales"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/locale-langpack/en_GB/LC_MESSAGES"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/groff/1.18.1/font/devps"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/man/man7"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/X11/xkb/symbols"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Africa"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Asia"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/America"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/zoneinfo/Asia"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/doc"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/share/consolefonts"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc64"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-mips"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-parisc"): Input/output error (5) file has vanished: "/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc/\#012" rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"): Input/output error (5) rsync: readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"): Input/output error (5) ... Any ideas? Meanwhile, I'll try to track it down in strace (the output will be huge, but maybe I'll get lucky)... Thanks, Brent_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel