On Aug 7, 2008, at 2:04 AM, Keith Freedman wrote: > so server side afr takes 220% longer than client side AFR > > > If all my assumptions are true. what might solve some of the > problem (this would help both client side and server side), is to > use additional network ports. > Either the server replicates over a different port or the client > talks to the 2 servers over different ports. Its not a complex test. Its not a complex setup. A glusterfs mounted partition set with client side AFR using the previously listed hardware over a dedicated GigE port was used to unpack: $ ls -al linux-2.6.26.1.tar.bz2 -rw-r--r-- 1 daviesinc daviesinc 49459141 2008-08-01 19:04 linux-2.6.26.1.tar.bz2 The system was not iobound on the network, nor cpu bound. Neither server's cpu went above 3% for either gluster process, the network barely showed any activity and was at less than 12mb/second. > It would be interesting for you to rerun your tests with a multi-nic > configuration in both scenarios. > > It's safe to assume that at any speed, more is better :) so you believe that to untar/unbz2 a 49mb file in under 17 minutes, I need to bond 2 gigE connections? > Depending on your port speeds, which I dont recall, but I think you > provided, your hardware disk configuration wont matter. 100BaseT > you can probably do just as well with a single drive as with a raid > 0, 1, or 0+1. with 1000BaseT or faster you will want a drive > configuration that can sustain the data transfer you'll be needing. this is done under clientside AFR, file is written to both machines. a 4.3GB file almost hits wirespeed between the nodes. $ dd if=/dev/zero of=three bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 37.3989 s, 115 MB/s $ time cp linux-2.6.26.1.tar.bz2 linux-2.6.26.1.tar.bz2.copy real 0m0.573s user 0m0.000s sys 0m0.052s I can copy the 49mb file in .57 seconds. During the tar xjf, switch stats show almost 500pps, bandwidth almost hits 4mb/sec (megabits), cpu shows glusterfs and glusterfsd at 1% of the cpu each, bzip at roughly 2%, tar rarely shows up in top, but, when it does, its very close to the bottom of the page at 1%. $ time tar xjf linux-2.6.26.1.tar.bz2 real 18m6.799s user 0m12.877s sys 0m1.416s I'm not convinced that this is a network or hardware problem. > > > Hope that wasn't confusing. > > At 10:05 PM 8/6/2008, Chris Davies wrote: >> A continuation: >> >> I used XFS & MD raid 1 on the partitions for the initial tests. >> I tested reiser3 and reiser4 with no significant difference >> I reraided to MD Raid 0 with XFS and received some improvement >> >> I NFS mounted the partition and received bonnie++ numbers similar to >> the best clientside AFR numbers I have been able to get, but, >> unpacking the kernel using nfsv4/udp took 1 minute 47 seconds >> compared >> with 12 seconds for the bare drive, 41 seconds for serverside AFR and >> an average of 17 minutes for clientside AFR. >> >> If I turn off AFR, whether I mount the remote machine over the net or >> use the local server's brick, tar xjf of a kernel takes roughly 29 >> seconds. >> >> Large files replicate almost at wire speed. rsync/cp -Rp of a large >> directory takes considerable time. >> >> Both QA releases I've attempted of 1.4.0 have broken within minutes >> using my configurations. 1.4.0qa32 and 1.4.0qa33. I'll turn debug >> logs on and post summaries of those. >> >> On Aug 6, 2008, at 2:48 PM, Chris Davies wrote: >> >> > OS: Debian Linux/4.1, 64bit build >> > Hardware: quad core xeon x3220, 8gb RAM, dual 7200RPM 1000gb WD >> Hard >> > Drives, 750gb raid 1 partition set as /gfsvol to be exported, dual >> > gigE, juniper ex3200 switch >> > >> > Fuse libraries: fuse-2.7.3glfs10 >> > Gluster: glusterfs-1.3.10 >> > >> > Running bonnie++ on both machines results in almost identical >> numbers, >> > eth1 is reserved wholly for server to server communications. Right >> > now, the only load on these machines comes from my testbed. >> There are >> > four tests that give a reasonable indicator of performance. >> > >> > * loading a wordpress blog and looking at the line: >> > <!-- 24 queries. 0.634 seconds. --> >> > * dd if=/dev/zero of=/gfs/test/out bs=1M count=512 >> > * time tar xjf /gfs/test/linux-2.6.26.1.tar.bz2 >> > * /usr/sbin/bonnie++ /gfs/test/ >> > >> > On the wordpress test, .3 seconds is typical. On various gluster >> > configurations I've received between .411 seconds (server side afr >> > config below) and 1.2 seconds with some of the example >> > configurations. Currently, my clientside AFR config comes in at . >> 5xx >> > seconds rather consistently. >> > >> > The second test on the clientside AFR results in 536870912 bytes >> (537 >> > MB) copied, 4.65395 s, 115 MB/s >> > >> > The third test is unpacking a kernel which has ranged from 28 >> seconds >> > using the Serverside AFR to 6+ minutes on some configurations. >> > Currently the clientside AFR config comes in at about 17 minutes. >> > >> > The fourth test is a run of bonnie++ which varies from 36 minutes >> on >> > the serverside AFR to the 80 minute run on the clientside AFR >> config. >> > >> > Current test environment is using both servers as clients & >> servers -- >> > if I can get reasonable performance, the existing machines will >> become >> > clients and the servers will be split to their own platform, so, I >> > want to make sure I am using tcp for connections to give as close >> to a >> > real world deployment as possible. This means I cannot run a >> client- >> > only config. >> > >> > Baseline Wordpress returns .311-.399 seconds >> > Baseline dd 536870912 bytes (537 MB) copied, 0.489522 s, 1.1 GB/s >> > Baseline tar xjf of the kernel, real 0m12.164s >> > Baseline Config bonnie++ run on the raid 1 partition: (echo data | >> > bon_csv2txt for the text reporting) >> > >> > c1ws1,16G, >> > 66470,97,93198,16,42430,6,60253,86,97153,7,381.3,0,16,7534,37,++++ >> +,++ >> > +,5957,23,7320,34,+++++,+++,4667,21 >> > >> > So far, the best performance I could manage was Server Side AFR >> with >> > writebehind/readahead on the server, with aggregate-size set to >> 0mb, >> > and the client side running writebehind/readahead. That resulted >> in: >> > >> > c1ws2,16G, >> > >> 37636,50,76855,3,17429,2,60376,76,87653,3,158.6,0,16,1741,3,9683,6,2591,3,2030,3,9790,5,2369,3 >> > >> > It was suggested in IRC that clientside AFR would be faster and >> more >> > reliable, however, I've ended up with the following as the best >> > results from multiple attempts: >> > >> > c1ws1,16G, >> > >> 46041,58,76811,2,4603,0,59140,76,86103,3,132.4,0,16,1069,2,4795,2,1308,2,1045,2,5209,2,1246,2 >> > >> > The bonnie++ run from the serverside AFR that resulted in the best >> > results I've received to date took 34 minutes. The latest >> clientside >> > AFR bonnie run took 80 minutes. Based on the website, I would >> expect >> > to see better performance than drbd/GFS, but, so far that hasn't >> been >> > the case. >> > >> > Its been suggested that I use unify-rr-afr. In my current setup, >> it >> > seems that to do that, I would need to break my raid set which is >> my >> > next step in debugging this. Rather than use Raid 1 on the >> server, I >> > would have 2 bricks on each server which would allow the use of >> unify >> > and the rr scheduler. >> > >> > glusterfs-1.4.0qa32 results in >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14025 exit signal Bus >> > error (7) >> > [Wed Aug 06 02:01:44 2008] [notice] child pid 14037 exit signal Bus >> > error (7) >> > >> > when apache (not mod_gluster) tries to serve files off the >> glusterfs >> > partition. >> > >> > The main issue I'm having right now is file creation speed. I >> realize >> > that to create a file I need to do two network ops for each file >> > created, but, it seems that something is horribly wrong in my >> > configuration from the results in untarring the kernel. >> > >> > I've tried moving the performance translators around, but, some >> don't >> > seem to make much difference on the server side, and the ones that >> > appear to make some difference client side, don't seem to help the >> > file creation issue. >> > >> > On a side note, zresearch.com, I emailed through your contact >> form and >> > haven't heard back -- please provide a quote for generating the >> > configuration and contact me offlist. >> > >> > ===/etc/gluster/gluster-server.vol >> > volume posix >> > type storage/posix >> > option directory /gfsvol/data >> > end-volume >> > >> > volume plocks >> > type features/posix-locks >> > subvolumes posix >> > end-volume >> > >> > volume writebehind >> > type performance/write-behind >> > option flush-behind off # default is 'off' >> > subvolumes plocks >> > end-volume >> > >> > volume readahead >> > type performance/read-ahead >> > option page-size 128kB # 256KB is the default option >> > option page-count 4 # 2 is default option >> > option force-atime-update off # default is off >> > subvolumes writebehind >> > end-volume >> > >> > volume brick >> > type performance/io-threads >> > option thread-count 4 # deault is 1 >> > option cache-size 64MB #64MB >> > subvolumes readahead >> > end-volume >> > >> > volume server >> > type protocol/server >> > option transport-type tcp/server >> > subvolumes brick >> > option auth.ip.brick.allow 10.8.1.*,127.0.0.1 >> > end-volume >> > >> > >> > ===/etc/glusterfs/gluster-client.vol >> > >> > volume brick1 >> > type protocol/client >> > option transport-type tcp/client # for TCP/IP transport >> > option remote-host 10.8.1.9 # IP address of server1 >> > option remote-subvolume brick # name of the remote volume on >> > server1 >> > end-volume >> > >> > volume brick2 >> > type protocol/client >> > option transport-type tcp/client # for TCP/IP transport >> > option remote-host 10.8.1.10 # IP address of server2 >> > option remote-subvolume brick # name of the remote volume on >> > server2 >> > end-volume >> > >> > volume afr >> > type cluster/afr >> > subvolumes brick1 brick2 >> > end-volume >> > >> > volume writebehind >> > type performance/write-behind >> > option aggregate-size 0MB >> > option flush-behind off # default is 'off' >> > subvolumes afr >> > end-volume >> > >> > volume readahead >> > type performance/read-ahead >> > option page-size 128kB # 256KB is the default option >> > option page-count 4 # 2 is default option >> > option force-atime-update off # default is off >> > subvolumes writebehind >> > end-volume >> > >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >> > >> > > >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > !DSPAM:1,489aa3b2286571187917547! >