Am Dienstag 08 Januar 2008 09:44:34 schrieb Anand Avati: > was it just one file being read/written? if not please use rr (a more > deterministic scheduler) and share the numbers please. I did only the bonnie tests, posting I put the link in wasn't by me. Cheers, Sascha > about the create and delete rate, client side afr is defnitely faster since > the create operations happen parallely (w.r.t to the network - 1x the > time.) but if you have afr on server, it happens serially across the > machines (2x the time.. 1x upto the 1st server, and 1x to the remaining N-1 > servers). > > not that for such a configuration, unify is not needed, and removing unify > and having plain AFR will increase your create rate further since unify > serializes creates across the namespace and storage. > > about throughput, the writes seem to be faster with server AFR (or did my > eyes defeat me with that indentation?) and reads faster with client AFR. > > the faster writes might be because of a good job done by the full-duplex > switch. when afr is on the client side, both copies are outbound on the > same outbound channel of the client NIC, effectively serially writing two > copies, but when on the server side, the replication copy is using the > outbound channel of the server NIC while the main loop is parallely > fetching the next write block in the server's inbound channel. using > io-threads between afr and protocol/client on the replication path might > help further. > > the slower reads might be because.. well i'm still not sure, maybe you have > secretly applied the striped readv patch for afr? :) > > avati > > 2008/1/8, Sascha Ottolski <ottolski@xxxxxx>: > > Am Dienstag 08 Januar 2008 06:06:30 schrieb Anand Avati: > > > > > Brandon, > > > > > who does the copy is decided where the AFR translator is loaded. > > > > > if you have AFR loaded on the client side, then the client does the > > > > > two writes. > > > > > > > > you > > > > > > > > > can also have AFR loaded on the server side, and handle server do > > > > the > > > > > > > replication. Translators can be loaded anywhere (client or server, > > > > > > > > anywhere > > > > > > > > > in the graph). You need to think more on the lines on how you can > > > > > > > > 'program > > > > > > > > > glusterfs' rather than how to 'configure glusterfs'. > > > > > > > > Performance-wise, which is better? Or does it make sense one way vs. > > > > the other based on number of clients? > > > > > > Depends, if the interconnect between server and client is precious, > > > then have the servers replicate (load afr on server side) with > > > replication happening on a seperate network. This is also good if you > > > have servers interconnected with high speed networks like infiniband. > > > > > > If your servers are having just one network interface (no seperate > > > > network > > > > > for replication), and your client apps are IO bound, then it does not > > > matter where you load AFR; they all would give the same performance. > > > > > > avati > > > > i did a simple test recently, which suggests that there is a significant > > performance difference: I did a comparison of client vs. server > > side afr with bonnie, for a one client and two servers setup with tla > > patch628, connected over GB Ethernet; please see my results below. > > > > There also was a posting on this list with a lot of test results, > > suggesting that server side afr is fastest: > > http://lists.nongnu.org/archive/html/gluster-devel/2007-08/msg00136.html > > > > In my own results though, client-side afr seems to be better in most of > > the test; I should note that I'm not sure if the chosen setup has a > > negative impact on the performance (two servers afr-ing each other), so > > any comments on this would be highly appreciate (I add the configs for > > the tests below). > > > > server side afr (I hope it stays readable): > > > > Version 1.03 ------Sequential Output------ --Sequential Input- > > --Random- > > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > > --Seeks-- > > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > > %CP /sec %CP > > stf-db22 31968M 31438 43 35528 0 990 0 32375 43 41107 1 > > 38.1 0 > > ------Sequential Create------ --------Random > > Create-------- > > -Create-- --Read--- -Delete-- -Create-- --Read--- > > -Delete-- > > files /sec %CP /sec %CP /sec %CP /sec %CP /sec > > %CP /sec %CP > > 16 34 0 416 0 190 0 35 0 511 0 > > 227 0 > > > > > > client side afr: > > > > Version 1.03 ------Sequential Output------ --Sequential Input- > > --Random- > > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > > --Seeks-- > > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > > %CP /sec %CP > > stf-db22 31968M 27583 38 31518 0 862 0 49522 63 56388 2 > > 28.0 0 > > ------Sequential Create------ --------Random > > Create-------- > > -Create-- --Read--- -Delete-- -Create-- --Read--- > > -Delete-- > > files /sec %CP /sec %CP /sec %CP /sec %CP /sec > > %CP /sec %CP > > 16 418 0 2225 1 948 1 455 0 2305 1 > > 947 0 > > > > > > > > server side afr config: > > > > > > glusterfs-server.vol.server_afr: > > > > volume fsbrick1 > > type storage/posix > > option directory /data1 > > end-volume > > > > volume fsbrick2 > > type storage/posix > > option directory /data2 > > end-volume > > > > volume nsfsbrick1 > > type storage/posix > > option directory /data-ns1 > > end-volume > > > > volume brick1 > > type performance/io-threads > > option thread-count 8 > > option queue-limit 1024 > > subvolumes fsbrick1 > > end-volume > > > > volume brick2 > > type performance/io-threads > > option thread-count 8 > > option queue-limit 1024 > > subvolumes fsbrick2 > > end-volume > > > > volume brick1r > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.99 > > option remote-subvolume brick2 > > end-volume > > > > volume afr1 > > type cluster/afr > > subvolumes brick1 brick1r > > # option replicate *:2 # obsolete with tla snapshot > > end-volume > > > > ### Add network serving capability to above bricks. > > volume server > > type protocol/server > > option transport-type tcp/server # For TCP/IP transport > > option listen-port 6996 # Default is 6996 > > option client-volume-filename /etc/glusterfs/glusterfs-client.vol > > subvolumes afr1 nsfsbrick1 > > option auth.ip.afr1.allow * # Allow access to "brick" volume > > option auth.ip.brick2.allow * # Allow access to "brick" volume > > option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume > > end-volume > > > > > > > > > > > > > > glusterfs-client.vol.test.server_afr: > > > > volume fsc1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.10 > > option remote-subvolume afr1 > > end-volume > > > > volume fsc2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.99 > > option remote-subvolume afr1 > > end-volume > > > > volume ns1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.10 > > option remote-subvolume nsfsbrick1 > > end-volume > > > > volume ns2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.99 > > option remote-subvolume nsfsbrick1 > > end-volume > > > > volume afrns > > type cluster/afr > > subvolumes ns1 ns2 > > end-volume > > > > volume bricks > > type cluster/unify > > subvolumes fsc1 fsc2 > > option namespace afrns > > option scheduler alu > > option alu.limits.min-free-disk 5% # Stop creating > > files when free-space lt 5 % > > option alu.limits.max-open-files 10000 > > option > > alu.orderdisk-usage:read-usage:write-usage:open-files-usage:disk-speed-us > >age option alu.disk-usage.entry-threshold 2GB # Units in KB, MB > > and GB are allowed > > option alu.disk-usage.exit-threshold 60MB # Units in KB, MB > > and GB are allowed > > option alu.open-files-usage.entry-threshold 1024 > > option alu.open-files-usage.exit-threshold 32 > > option alu.stat-refresh.interval 10sec > > end-volume > > > > volume readahead > > type performance/read-ahead > > option page-size 256KB > > option page-count 2 > > subvolumes bricks > > end-volume > > > > volume write-behind > > type performance/write-behind > > option aggregate-size 1MB > > subvolumes readahead > > end-volume > > > > > > ----------------------------------------------------------------------- > > > > client side afr config: > > > > > > glusterfs-server.vol.client_afr: > > > > volume fsbrick1 > > type storage/posix > > option directory /data1 > > end-volume > > > > volume fsbrick2 > > type storage/posix > > option directory /data2 > > end-volume > > > > volume nsfsbrick1 > > type storage/posix > > option directory /data-ns1 > > end-volume > > > > volume brick1 > > type performance/io-threads > > option thread-count 8 > > option queue-limit 1024 > > subvolumes fsbrick1 > > end-volume > > > > volume brick2 > > type performance/io-threads > > option thread-count 8 > > option queue-limit 1024 > > subvolumes fsbrick2 > > end-volume > > > > ### Add network serving capability to above bricks. > > volume server > > type protocol/server > > option transport-type tcp/server # For TCP/IP transport > > option listen-port 6996 # Default is 6996 > > option client-volume-filename /etc/glusterfs/glusterfs-client.vol > > subvolumes brick1 brick2 nsfsbrick1 > > option auth.ip.brick1.allow * # Allow access to "brick" volume > > option auth.ip.brick2.allow * # Allow access to "brick" volume > > option auth.ip.nsfsbrick1.allow * # Allow access to "brick" volume > > end-volume > > > > > > > > > > > > > > glusterfs-client.vol.test.client_afr: > > > > volume fsc1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.10 > > option remote-subvolume brick1 > > end-volume > > > > volume fsc1r > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.10 > > option remote-subvolume brick2 > > end-volume > > > > volume fsc2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.99 > > option remote-subvolume brick1 > > end-volume > > > > volume fsc2r > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.99 > > option remote-subvolume brick2 > > end-volume > > > > volume afr1 > > type cluster/afr > > subvolumes fsc1 fsc2r > > # option replicate *:2 # obsolete with tla snapshot > > end-volume > > > > volume afr2 > > type cluster/afr > > subvolumes fsc2 fsc1r > > # option replicate *:2 # obsolete with tla snapshot > > end-volume > > > > volume ns1 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.10 > > option remote-subvolume nsfsbrick1 > > end-volume > > > > volume ns2 > > type protocol/client > > option transport-type tcp/client > > option remote-host 10.10.1.99 > > option remote-subvolume nsfsbrick1 > > end-volume > > > > volume afrns > > type cluster/afr > > subvolumes ns1 ns2 > > # option replicate *:2 # obsolete with tla snapshot > > end-volume > > > > volume bricks > > type cluster/unify > > subvolumes afr1 afr2 > > option namespace afrns > > option scheduler alu > > option alu.limits.min-free-disk 5% # Stop creating > > files when free-space lt 5 % > > option alu.limits.max-open-files 10000 > > option > > alu.orderdisk-usage:read-usage:write-usage:open-files-usage:disk-speed-us > >age option alu.disk-usage.entry-threshold 2GB # Units in KB, MB > > and GB are allowed > > option alu.disk-usage.exit-threshold 60MB # Units in KB, MB > > and GB are allowed > > option alu.open-files-usage.entry-threshold 1024 > > option alu.open-files-usage.exit-threshold 32 > > option alu.stat-refresh.interval 10sec > > end-volume > > > > volume readahead > > type performance/read-ahead > > option page-size 256KB > > option page-count 2 > > subvolumes bricks > > end-volume > > > > volume write-behind > > type performance/write-behind > > option aggregate-size 1MB > > subvolumes readahead > > end-volume > > > > > > Cheers, Sascha > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@xxxxxxxxxx > > http://lists.nongnu.org/mailman/listinfo/gluster-devel