Hi Shawn, Can you give us the exact rsync command you used? Thanks Krishna On 4/3/07, Shawn Northart <shawn@xxxxxxxxxxxxxxxxxx> wrote:
I'm noticing a problem with our test setup with regard to (reasonably) heavy read/write usage. the probelem we're having is that during an rsync of content, the sync bails due to the mount being lost with the following errors: <snip> rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trailers" failed: Transport endpoint is not connected (107) rsync: recv_generator: mkdir "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: Transport endpoint is not connected (107) rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: Transport endpoint is not connected (107) rsync: recv_generator: mkdir "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" failed: Transport endpoint is not connected (107) rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" failed: Transport endpoint is not connected (107) rsync: recv_generator: mkdir "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" failed: Transport endpoint is not connected (107) rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" failed: Transport endpoint is not connected (107) rsync: recv_generator: mkdir "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" failed: Transport endpoint is not connected (107) rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" failed: Transport endpoint is not connected (107) </snip> normal logging shows nothing either client or server-side, but running logging in DEBUG mode shows the following at the end of the client log right as it breaks: <snip> [Apr 02 13:25:11] [DEBUG/common-utils.c:213/gf_print_trace()] debug-backtrace:Got signal (11), printing backtrace [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0(gf_print_trace+0x1f) [0x2a9556030f] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/lib64/tls/libc.so.6 [0x35b992e2b0] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/lib64/tls/libpthread.so.0(__pthread_mutex_destroy+0) [0x35ba807ab0] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/cluster/afr.so [0x2a958b840c] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/protocol/client.so [0x2a957b06c2] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/protocol/client.so [0x2a957b3196] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0(epoll_iteration+0xf8) [0x2a955616f8] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:[glusterfs] [0x4031b7] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x35b991c3fb] [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] debug-backtrace:[glusterfs] [0x402bba] </snip> the server log shows the following at the time it breaks: <snip> [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 [Apr 02 15:30:09] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] protocol/server:cleaned up xl_private of 0x510470 [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] tcp/server:destroying transport object for 192.168.0.96:1012 (fd=8) [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 [Apr 02 15:30:09] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] protocol/server:cleaned up xl_private of 0x510160 [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] tcp/server:destroying transport object for 192.168.0.96:1013 (fd=7) [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] libglusterfs:full_rw: 0 bytes r/w instead of 113 [Apr 02 15:30:09] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] protocol/server:cleaned up xl_private of 0x502300 [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] tcp/server:destroying transport object for 192.168.0.96:1014 (fd=4) </snip> we're using 4 bricks in this setup and for the moment, just one client (would like to scale between 20-30 clients and 4-8 server bricks). the same behavior is observed when used with or without any combination of any of the performance translators as well as with or without file replication. alu, random, and round-robin schedulers were all used in our testing. the systems in question are running CentOS (4.4). these logs are from our 64-bit systems but we have seen the exact same thing on the 32-bit ones as well. this (glusterfs) looks like it could be a good fit for some of the high-traffic domains we host, but unless we can resolve this issue, we'll have to continue using NFS. our current server-side (brick) config consists of the following: ##-- begin server config volume vol1 type storage/posix option directory /vol/vol1/gfs end-volume volume vol2 type storage/posix option directory /vol/vol2/gfs end-volume volume vol3 type storage/posix option directory /vol/vol3/gfs end-volume volume brick1 type performance/io-threads option thread-count 8 subvolumes vol1 end-volume volume brick2 type performance/io-threads option thread-count 8 subvolumes vol2 end-volume volume brick3 type performance/io-threads option thread-count 8 subvolumes vol3 end-volume volume server type protocol/server option transport-type tcp/server option bind-address 10.88.188.91 subvolumes brick1 brick2 brick3 option auth.ip.brick1.allow 192.168.0.* option auth.ip.brick2.allow 192.168.0.* option auth.ip.brick3.allow 192.168.0.* end-volume ##-- end server config our client config is as follows: ##-- begin client config volume test00.1 type protocol/client option transport-type tcp/client option remote-host 192.168.0.91 option remote-subvolume brick1 end-volume volume test00.2 type protocol/client option transport-type tcp/client option remote-host 192.168.0.91 option remote-subvolume brick2 end-volume volume test00.3 type protocol/client option transport-type tcp/client option remote-host 192.168.0.91 option remote-subvolume brick3 end-volume volume test01.1 type protocol/client option transport-type tcp/client option remote-host 192.168.0.92 option remote-subvolume brick1 end-volume volume test01.2 type protocol/client option transport-type tcp/client option remote-host 192.168.0.92 option remote-subvolume brick2 end-volume volume test01.3 type protocol/client option transport-type tcp/client option remote-host 192.168.0.92 option remote-subvolume brick3 end-volume volume test02.1 type protocol/client option transport-type tcp/client option remote-host 192.168.0.93 option remote-subvolume brick1 end-volume volume test02.2 type protocol/client option transport-type tcp/client option remote-host 192.168.0.93 option remote-subvolume brick2 end-volume volume test02.3 type protocol/client option transport-type tcp/client option remote-host 192.168.0.93 option remote-subvolume brick3 end-volume volume test03.1 type protocol/client option transport-type tcp/client option remote-host 192.168.0.94 option remote-subvolume brick1 end-volume volume test03.2 type protocol/client option transport-type tcp/client option remote-host 192.168.0.94 option remote-subvolume brick2 end-volume volume test03.3 type protocol/client option transport-type tcp/client option remote-host 192.168.0.94 option remote-subvolume brick3 end-volume volume afr0 type cluster/afr subvolumes test00.1 test01.2 test02.3 option replicate *.html:3,*.db:1,*:3 end-volume volume afr1 type cluster/afr subvolumes test01.1 test02.2 test03.3 option replicate *.html:3,*.db:1,*:3 end-volume volume afr2 type cluster/afr subvolumes test02.1 test03.2 test00.3 option replicate *.html:3,*.db:1,*:3 end-volume volume afr3 type cluster/afr subvolumes test03.1 test00.2 test01.3 option replicate *.html:3,*.db:1,*:3 end-volume volume bricks type cluster/unify subvolumes afr0 afr1 afr2 afr3 option readdir-force-success on option scheduler alu option alu.limits.min-free-disk 60GB option alu.limits.max-open-files 10000 option alu.order disk-usage:read-usage:open-files-usage:write-usage:disk-speed-usage option alu.disk-usage.entry-threshold 2GB option alu.disk-usage.exit-threshold 60MB option alu.open-files-usage.entry-threshold 1024 option alu.open-files-usage.exit-threshold 32 option alu.stat-refresh.interval 10sec option alu.read-usage.entry-threshold 20% option alu.read-usage.exit-threshold 4% option alu.write-usage.entry-threshold 20% option alu.write-usage.exit-threshold 4% end-volume ##-- end client config ~Shawn _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel