sorry, forgot that one. the command used was: rsync -av --stats --progress --delete i haven't tried setting a bwlimit yet and i'd prefer not to have to if possible. i've got roughly 450GB of data i want to sync over and the faster i can do it, the better. i will try it just to see if it makes things any better. the network is all copper gig with both interfaces trunked and vlan'd (on both client and server). a couple of other things that just came to mind are that i didn't see this exact behavior during the initial rsync. i have three directories i'm trying to sync and when run concurrently, i would see the problem. when run one at a time, the sync would seem to complete without incident. the only difference in the command i ran was that i omitted the --delete flag. ~Shawn On Tue, 2007-04-03 at 11:07 +0530, Krishna Srinivas wrote: > Hi Shawn, > > Can you give us the exact rsync command you used? > > Thanks > Krishna > > On 4/3/07, Shawn Northart <shawn@xxxxxxxxxxxxxxxxxx> wrote: > > I'm noticing a problem with our test setup with regard to (reasonably) > > heavy read/write usage. > > the probelem we're having is that during an rsync of content, the sync > > bails due to the mount being lost with the following errors: > > > > <snip> > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trailers" failed: > > Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: Transport > > endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: > > Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" failed: > > Transport endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" > > failed: Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" failed: > > Transport endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" > > failed: Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" failed: Transport > > endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" > > failed: Transport endpoint is not connected (107) > > </snip> > > > > normal logging shows nothing either client or server-side, but running > > logging in DEBUG mode shows the following at the end of the client log > > right as it breaks: > > > > <snip> > > [Apr 02 13:25:11] [DEBUG/common-utils.c:213/gf_print_trace()] > > debug-backtrace:Got signal (11), printing backtrace > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0(gf_print_trace+0x1f) [0x2a9556030f] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/lib64/tls/libc.so.6 [0x35b992e2b0] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/lib64/tls/libpthread.so.0(__pthread_mutex_destroy+0) > > [0x35ba807ab0] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/cluster/afr.so [0x2a958b840c] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/protocol/client.so [0x2a957b06c2] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/protocol/client.so [0x2a957b3196] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0(epoll_iteration+0xf8) [0x2a955616f8] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:[glusterfs] [0x4031b7] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/lib64/tls/libc.so.6(__libc_start_main+0xdb) > > [0x35b991c3fb] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:[glusterfs] [0x402bba] > > </snip> > > > > > > the server log shows the following at the time it breaks: > > <snip> > > [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] > > libglusterfs:full_rw: 0 bytes r/w instead of 113 > > [Apr 02 15:30:09] > > [DEBUG/protocol.c:244/gf_block_unserialize_transport()] > > libglusterfs/protocol:gf_block_unserialize_transport: full_read of > > header failed > > [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] > > protocol/server:cleaned up xl_private of 0x510470 > > [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] > > tcp/server:destroying transport object for 192.168.0.96:1012 (fd=8) > > [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] > > libglusterfs:full_rw: 0 bytes r/w instead of 113 > > [Apr 02 15:30:09] > > [DEBUG/protocol.c:244/gf_block_unserialize_transport()] > > libglusterfs/protocol:gf_block_unserialize_transport: full_read of > > header failed > > [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] > > protocol/server:cleaned up xl_private of 0x510160 > > [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] > > tcp/server:destroying transport object for 192.168.0.96:1013 (fd=7) > > [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] > > libglusterfs:full_rw: 0 bytes r/w instead of 113 > > [Apr 02 15:30:09] > > [DEBUG/protocol.c:244/gf_block_unserialize_transport()] > > libglusterfs/protocol:gf_block_unserialize_transport: full_read of > > header failed > > [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] > > protocol/server:cleaned up xl_private of 0x502300 > > [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] > > tcp/server:destroying transport object for 192.168.0.96:1014 (fd=4) > > </snip> > > > > we're using 4 bricks in this setup and for the moment, just one client > > (would like to scale between 20-30 clients and 4-8 server bricks). > > the same behavior is observed when used with or without any combination > > of any of the performance translators as well as with or without file > > replication. alu, random, and round-robin schedulers were all used in > > our testing. > > the systems in question are running CentOS (4.4). these logs are from > > our 64-bit systems but we have seen the exact same thing on the 32-bit > > ones as well. > > this (glusterfs) looks like it could be a good fit for some of the > > high-traffic domains we host, but unless we can resolve this issue, > > we'll have to continue using NFS. > > > > > > our current server-side (brick) config consists of the following: > > ##-- begin server config > > volume vol1 > > type storage/posix > > option directory /vol/vol1/gfs > > end-volume > > > > volume vol2 > > type storage/posix > > option directory /vol/vol2/gfs > > end-volume > > > > volume vol3 > > type storage/posix > > option directory /vol/vol3/gfs > > end-volume > > > > volume brick1 > > type performance/io-threads > > option thread-count 8 > > subvolumes vol1 > > end-volume > > > > volume brick2 > > type performance/io-threads > > option thread-count 8 > > subvolumes vol2 > > end-volume > > > > volume brick3 > > type performance/io-threads > > option thread-count 8 > > subvolumes vol3 > > end-volume > > > > volume server > > type protocol/server > > option transport-type tcp/server > > option bind-address 10.88.188.91 > > subvolumes brick1 brick2 brick3 > > option auth.ip.brick1.allow 192.168.0.* > > option auth.ip.brick2.allow 192.168.0.* > > option auth.ip.brick3.allow 192.168.0.* > > end-volume > > ##-- end server config > > > > > > our client config is as follows: > > > > ##-- begin client config > > volume test00.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.91 > > option remote-subvolume brick1 > > end-volume > > volume test00.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.91 > > option remote-subvolume brick2 > > end-volume > > volume test00.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.91 > > option remote-subvolume brick3 > > end-volume > > > > > > volume test01.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.92 > > option remote-subvolume brick1 > > end-volume > > volume test01.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.92 > > option remote-subvolume brick2 > > end-volume > > volume test01.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.92 > > option remote-subvolume brick3 > > end-volume > > > > > > volume test02.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.93 > > option remote-subvolume brick1 > > end-volume > > volume test02.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.93 > > option remote-subvolume brick2 > > end-volume > > volume test02.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.93 > > option remote-subvolume brick3 > > end-volume > > > > > > volume test03.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.94 > > option remote-subvolume brick1 > > end-volume > > volume test03.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.94 > > option remote-subvolume brick2 > > end-volume > > volume test03.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.94 > > option remote-subvolume brick3 > > end-volume > > > > > > > > volume afr0 > > type cluster/afr > > subvolumes test00.1 test01.2 test02.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > volume afr1 > > type cluster/afr > > subvolumes test01.1 test02.2 test03.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > volume afr2 > > type cluster/afr > > subvolumes test02.1 test03.2 test00.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > volume afr3 > > type cluster/afr > > subvolumes test03.1 test00.2 test01.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > > > volume bricks > > type cluster/unify > > subvolumes afr0 afr1 afr2 afr3 > > option readdir-force-success on > > > > option scheduler alu > > option alu.limits.min-free-disk 60GB > > option alu.limits.max-open-files 10000 > > > > option alu.order > > disk-usage:read-usage:open-files-usage:write-usage:disk-speed-usage > > > > option alu.disk-usage.entry-threshold 2GB > > option alu.disk-usage.exit-threshold 60MB > > option alu.open-files-usage.entry-threshold 1024 > > option alu.open-files-usage.exit-threshold 32 > > option alu.stat-refresh.interval 10sec > > > > option alu.read-usage.entry-threshold 20% > > option alu.read-usage.exit-threshold 4% > > option alu.write-usage.entry-threshold 20% > > option alu.write-usage.exit-threshold 4% > > > > end-volume > > ##-- end client config > > > > > > ~Shawn > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > >