Does this get better if you use the --bwlimit flag on rsync?
On Apr 2, 2007, at 3:46 PM, Shawn Northart wrote:
I'm noticing a problem with our test setup with regard to (reasonably)
heavy read/write usage.
the probelem we're having is that during an rsync of content, the sync
bails due to the mount being lost with the following errors:
<snip>
rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trailers" failed:
Transport endpoint is not connected (107)
rsync: recv_generator: mkdir
"/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed: Transport
endpoint is not connected (107)
rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember" failed:
Transport endpoint is not connected (107)
rsync: recv_generator: mkdir
"/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux" failed:
Transport endpoint is not connected (107)
rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/bardoux"
failed: Transport endpoint is not connected (107)
rsync: recv_generator: mkdir
"/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images" failed:
Transport endpoint is not connected (107)
rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/trialmember/images"
failed: Transport endpoint is not connected (107)
rsync: recv_generator: mkdir
"/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers" failed:
Transport
endpoint is not connected (107)
rsync: stat "/vol/vol0/sites/TESTSITE.com/htdocs/upgrade_trailers"
failed: Transport endpoint is not connected (107)
</snip>
normal logging shows nothing either client or server-side, but running
logging in DEBUG mode shows the following at the end of the client log
right as it breaks:
<snip>
[Apr 02 13:25:11] [DEBUG/common-utils.c:213/gf_print_trace()]
debug-backtrace:Got signal (11), printing backtrace
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0
(gf_print_trace+0x1f) [0x2a9556030f]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/lib64/tls/libc.so.6 [0x35b992e2b0]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/lib64/tls/libpthread.so.0(__pthread_mutex_destroy+0)
[0x35ba807ab0]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-
pre2.2/xlator/cluster/afr.so [0x2a958b840c]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-
pre2.2/xlator/protocol/client.so [0x2a957b06c2]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/usr/local/glusterfs-mainline/lib/glusterfs/1.3.0-
pre2.2/xlator/protocol/client.so [0x2a957b3196]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/usr/local/glusterfs-mainline/lib/libglusterfs.so.0
(epoll_iteration+0xf8) [0x2a955616f8]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:[glusterfs] [0x4031b7]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:/lib64/tls/libc.so.6(__libc_start_main+0xdb)
[0x35b991c3fb]
[Apr 02 13:25:11] [DEBUG/common-utils.c:215/gf_print_trace()]
debug-backtrace:[glusterfs] [0x402bba]
</snip>
the server log shows the following at the time it breaks:
<snip>
[Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()]
libglusterfs:full_rw: 0 bytes r/w instead of 113
[Apr 02 15:30:09]
[DEBUG/protocol.c:244/gf_block_unserialize_transport()]
libglusterfs/protocol:gf_block_unserialize_transport: full_read of
header failed
[Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()]
protocol/server:cleaned up xl_private of 0x510470
[Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()]
tcp/server:destroying transport object for 192.168.0.96:1012 (fd=8)
[Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()]
libglusterfs:full_rw: 0 bytes r/w instead of 113
[Apr 02 15:30:09]
[DEBUG/protocol.c:244/gf_block_unserialize_transport()]
libglusterfs/protocol:gf_block_unserialize_transport: full_read of
header failed
[Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()]
protocol/server:cleaned up xl_private of 0x510160
[Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()]
tcp/server:destroying transport object for 192.168.0.96:1013 (fd=7)
[Apr 02 15:30:09] [ERROR/common-utils.c:54/full_rw()]
libglusterfs:full_rw: 0 bytes r/w instead of 113
[Apr 02 15:30:09]
[DEBUG/protocol.c:244/gf_block_unserialize_transport()]
libglusterfs/protocol:gf_block_unserialize_transport: full_read of
header failed
[Apr 02 15:30:09] [DEBUG/proto-srv.c:2868/proto_srv_cleanup()]
protocol/server:cleaned up xl_private of 0x502300
[Apr 02 15:30:09] [DEBUG/tcp-server.c:243/gf_transport_fini()]
tcp/server:destroying transport object for 192.168.0.96:1014 (fd=4)
</snip>
we're using 4 bricks in this setup and for the moment, just one client
(would like to scale between 20-30 clients and 4-8 server bricks).
the same behavior is observed when used with or without any
combination
of any of the performance translators as well as with or without file
replication. alu, random, and round-robin schedulers were all
used in
our testing.
the systems in question are running CentOS (4.4). these logs are
from
our 64-bit systems but we have seen the exact same thing on the 32-bit
ones as well.
this (glusterfs) looks like it could be a good fit for some of the
high-traffic domains we host, but unless we can resolve this issue,
we'll have to continue using NFS.
our current server-side (brick) config consists of the following:
##-- begin server config
volume vol1
type storage/posix
option directory /vol/vol1/gfs
end-volume
volume vol2
type storage/posix
option directory /vol/vol2/gfs
end-volume
volume vol3
type storage/posix
option directory /vol/vol3/gfs
end-volume
volume brick1
type performance/io-threads
option thread-count 8
subvolumes vol1
end-volume
volume brick2
type performance/io-threads
option thread-count 8
subvolumes vol2
end-volume
volume brick3
type performance/io-threads
option thread-count 8
subvolumes vol3
end-volume
volume server
type protocol/server
option transport-type tcp/server
option bind-address 10.88.188.91
subvolumes brick1 brick2 brick3
option auth.ip.brick1.allow 192.168.0.*
option auth.ip.brick2.allow 192.168.0.*
option auth.ip.brick3.allow 192.168.0.*
end-volume
##-- end server config
our client config is as follows:
##-- begin client config
volume test00.1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.91
option remote-subvolume brick1
end-volume
volume test00.2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.91
option remote-subvolume brick2
end-volume
volume test00.3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.91
option remote-subvolume brick3
end-volume
volume test01.1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.92
option remote-subvolume brick1
end-volume
volume test01.2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.92
option remote-subvolume brick2
end-volume
volume test01.3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.92
option remote-subvolume brick3
end-volume
volume test02.1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.93
option remote-subvolume brick1
end-volume
volume test02.2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.93
option remote-subvolume brick2
end-volume
volume test02.3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.93
option remote-subvolume brick3
end-volume
volume test03.1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.94
option remote-subvolume brick1
end-volume
volume test03.2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.94
option remote-subvolume brick2
end-volume
volume test03.3
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.94
option remote-subvolume brick3
end-volume
volume afr0
type cluster/afr
subvolumes test00.1 test01.2 test02.3
option replicate *.html:3,*.db:1,*:3
end-volume
volume afr1
type cluster/afr
subvolumes test01.1 test02.2 test03.3
option replicate *.html:3,*.db:1,*:3
end-volume
volume afr2
type cluster/afr
subvolumes test02.1 test03.2 test00.3
option replicate *.html:3,*.db:1,*:3
end-volume
volume afr3
type cluster/afr
subvolumes test03.1 test00.2 test01.3
option replicate *.html:3,*.db:1,*:3
end-volume
volume bricks
type cluster/unify
subvolumes afr0 afr1 afr2 afr3
option readdir-force-success on
option scheduler alu
option alu.limits.min-free-disk 60GB
option alu.limits.max-open-files 10000
option alu.order
disk-usage:read-usage:open-files-usage:write-usage:disk-speed-usage
option alu.disk-usage.entry-threshold 2GB
option alu.disk-usage.exit-threshold 60MB
option alu.open-files-usage.entry-threshold 1024
option alu.open-files-usage.exit-threshold 32
option alu.stat-refresh.interval 10sec
option alu.read-usage.entry-threshold 20%
option alu.read-usage.exit-threshold 4%
option alu.write-usage.entry-threshold 20%
option alu.write-usage.exit-threshold 4%
end-volume
##-- end client config
~Shawn
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel