Hi Shawn, We committed a fix today which might have fixed your problem, can you check with the latest source? I tried with two instances of rsync but the problem did not reproduce. If you still see the problem can you give more detailed steps to reproduce the problem? Thanks Krishna On 4/4/07, Shawn Northart <shawn@xxxxxxxxxxxxxxxxxx> wrote:
sorry, forgot that one. the command used was: rsync -av --stats --progress --delete i haven't tried setting a bwlimit yet and i'd prefer not to have to if possible. i've got roughly 450GB of data i want to sync over and the faster i can do it, the better. i will try it just to see if it makes things any better. the network is all copper gig with both interfaces trunked and vlan'd (on both client and server). a couple of other things that just came to mind are that i didn't see this exact behavior during the initial rsync. i have three directories i'm trying to sync and when run concurrently, i would see the problem. when run one at a time, the sync would seem to complete without incident. the only difference in the command i ran was that i omitted the --delete flag. ~Shawn On Tue, 2007-04-03 at 11:07 +0530, Krishna Srinivas wrote: > Hi Shawn, > > Can you give us the exact rsync command you used? > > Thanks > Krishna > > On 4/3/07, Shawn Northart <shawn@xxxxxxxxxxxxxxxxxx> wrote: > > I'm noticing a problem with our test setup with regard to (reasonably) > > heavy read/write usage. > > the probelem we're having is that during an rsync of content, the sync > > bails due to the mount being lost with the following errors: > > > > <snip> > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trailers" failed: > > Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: Transport > > endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: > > Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" failed: > > Transport endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" > > failed: Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" failed: > > Transport endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" > > failed: Transport endpoint is not connected (107) > > rsync: recv_generator: mkdir > > "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" failed: Transport > > endpoint is not connected (107) > > rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" > > failed: Transport endpoint is not connected (107) > > </snip> > > > > normal logging shows nothing either client or server-side, but running > > logging in DEBUG mode shows the following at the end of the client log > > right as it breaks: > > > > <snip> > > [Apr 02 13:25:11] [DEBUG/common-utils.c:213/gf_print_trace()] > > debug-backtrace:Got signal (11), printing backtrace > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0(gf_print_trace+0x1f) [0x2a9556030f] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/lib64/tls/libc.so.6 [0x35b992e2b0] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/lib64/tls/libpthread.so.0(__pthread_mutex_destroy+0) > > [0x35ba807ab0] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/cluster/afr.so [0x2a958b840c] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/protocol/client.so [0x2a957b06c2] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-pre2.2/xlator/protocol/client.so [0x2a957b3196] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0(epoll_iteration+0xf8) [0x2a955616f8] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:[glusterfs] [0x4031b7] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:/lib64/tls/libc.so.6(__libc_start_main+0xdb) > > [0x35b991c3fb] > > [Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()] > > debug-backtrace:[glusterfs] [0x402bba] > > </snip> > > > > > > the server log shows the following at the time it breaks: > > <snip> > > [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] > > libglusterfs:full_rw: 0 bytes r/w instead of 113 > > [Apr 02 15:30:09] > > [DEBUG/protocol.c:244/gf_block_unserialize_transport()] > > libglusterfs/protocol:gf_block_unserialize_transport: full_read of > > header failed > > [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] > > protocol/server:cleaned up xl_private of 0x510470 > > [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] > > tcp/server:destroying transport object for 192.168.0.96:1012 (fd=8) > > [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] > > libglusterfs:full_rw: 0 bytes r/w instead of 113 > > [Apr 02 15:30:09] > > [DEBUG/protocol.c:244/gf_block_unserialize_transport()] > > libglusterfs/protocol:gf_block_unserialize_transport: full_read of > > header failed > > [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] > > protocol/server:cleaned up xl_private of 0x510160 > > [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] > > tcp/server:destroying transport object for 192.168.0.96:1013 (fd=7) > > [Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()] > > libglusterfs:full_rw: 0 bytes r/w instead of 113 > > [Apr 02 15:30:09] > > [DEBUG/protocol.c:244/gf_block_unserialize_transport()] > > libglusterfs/protocol:gf_block_unserialize_transport: full_read of > > header failed > > [Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()] > > protocol/server:cleaned up xl_private of 0x502300 > > [Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()] > > tcp/server:destroying transport object for 192.168.0.96:1014 (fd=4) > > </snip> > > > > we're using 4 bricks in this setup and for the moment, just one client > > (would like to scale between 20-30 clients and 4-8 server bricks). > > the same behavior is observed when used with or without any combination > > of any of the performance translators as well as with or without file > > replication. alu, random, and round-robin schedulers were all used in > > our testing. > > the systems in question are running CentOS (4.4). these logs are from > > our 64-bit systems but we have seen the exact same thing on the 32-bit > > ones as well. > > this (glusterfs) looks like it could be a good fit for some of the > > high-traffic domains we host, but unless we can resolve this issue, > > we'll have to continue using NFS. > > > > > > our current server-side (brick) config consists of the following: > > ##-- begin server config > > volume vol1 > > type storage/posix > > option directory /vol/vol1/gfs > > end-volume > > > > volume vol2 > > type storage/posix > > option directory /vol/vol2/gfs > > end-volume > > > > volume vol3 > > type storage/posix > > option directory /vol/vol3/gfs > > end-volume > > > > volume brick1 > > type performance/io-threads > > option thread-count 8 > > subvolumes vol1 > > end-volume > > > > volume brick2 > > type performance/io-threads > > option thread-count 8 > > subvolumes vol2 > > end-volume > > > > volume brick3 > > type performance/io-threads > > option thread-count 8 > > subvolumes vol3 > > end-volume > > > > volume server > > type protocol/server > > option transport-type tcp/server > > option bind-address 10.88.188.91 > > subvolumes brick1 brick2 brick3 > > option auth.ip.brick1.allow 192.168.0.* > > option auth.ip.brick2.allow 192.168.0.* > > option auth.ip.brick3.allow 192.168.0.* > > end-volume > > ##-- end server config > > > > > > our client config is as follows: > > > > ##-- begin client config > > volume test00.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.91 > > option remote-subvolume brick1 > > end-volume > > volume test00.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.91 > > option remote-subvolume brick2 > > end-volume > > volume test00.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.91 > > option remote-subvolume brick3 > > end-volume > > > > > > volume test01.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.92 > > option remote-subvolume brick1 > > end-volume > > volume test01.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.92 > > option remote-subvolume brick2 > > end-volume > > volume test01.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.92 > > option remote-subvolume brick3 > > end-volume > > > > > > volume test02.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.93 > > option remote-subvolume brick1 > > end-volume > > volume test02.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.93 > > option remote-subvolume brick2 > > end-volume > > volume test02.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.93 > > option remote-subvolume brick3 > > end-volume > > > > > > volume test03.1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.94 > > option remote-subvolume brick1 > > end-volume > > volume test03.2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.94 > > option remote-subvolume brick2 > > end-volume > > volume test03.3 > > type protocol/client > > option transport-type tcp/client > > option remote-host 192.168.0.94 > > option remote-subvolume brick3 > > end-volume > > > > > > > > volume afr0 > > type cluster/afr > > subvolumes test00.1 test01.2 test02.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > volume afr1 > > type cluster/afr > > subvolumes test01.1 test02.2 test03.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > volume afr2 > > type cluster/afr > > subvolumes test02.1 test03.2 test00.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > volume afr3 > > type cluster/afr > > subvolumes test03.1 test00.2 test01.3 > > option replicate *.html:3,*.db:1,*:3 > > end-volume > > > > > > volume bricks > > type cluster/unify > > subvolumes afr0 afr1 afr2 afr3 > > option readdir-force-success on > > > > option scheduler alu > > option alu.limits.min-free-disk 60GB > > option alu.limits.max-open-files 10000 > > > > option alu.order > > disk-usage:read-usage:open-files-usage:write-usage:disk-speed-usage > > > > option alu.disk-usage.entry-threshold 2GB > > option alu.disk-usage.exit-threshold 60MB > > option alu.open-files-usage.entry-threshold 1024 > > option alu.open-files-usage.exit-threshold 32 > > option alu.stat-refresh.interval 10sec > > > > option alu.read-usage.entry-threshold 20% > > option alu.read-usage.exit-threshold 4% > > option alu.write-usage.entry-threshold 20% > > option alu.write-usage.exit-threshold 4% > > > > end-volume > > ##-- end client config > > > > > > ~Shawn > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel