My .02 below... -----Original Message----- From: Ed Cashin [mailto:ecashin@xxxxxxxxxx] Sent: Friday, September 6, 2013 10:11 AM To: Derick Swanepoel Cc: aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx Subject: Re: Poor performance on 10 Gbps SAN > That said, while checking the vblade README for the design goals, I noticed that it advertises > a capacity for 16 outstanding commands. If you want to try some tuning, you could adjust > Bufcount in dat.h and then make sure your settings in /proc are sufficient to allow the kernel > to buffer 16 writes. (Read commands are small.) Back when I was tuning AoE for our virtualization project I learned the hard way to be careful with the aoe_maxout parameter of the Linux driver. E.g. if you have a single target and single initiator, 16 may be appropriate, but if you have 4 hosts sharing one target, you can quickly overrun the command buffers if all hosts are doing I/O at once. I settled on aoe_maxout="8" as a compromise for stability and performance. Long command queues can wreak havoc with the Linux aoe driver RTT calculations too, leading to unnecessary retransmits (it's inevitable that average round-trip times go up as the queue length grows past the point of your array's ability to perform I/O in parallel). Retransmits will of course lower your throughput and decrease the efficiency of your network. With hardware flow control, we found that very few packets are completely lost, if ever. The most likely scenario for losing commands is to send more to the target than it can queue at once, such as by overrunning the kernel socket buffers. As you're testing the stack, pay close attention to network statistics as well as block statistics. I also found the "debug" output of the aoe driver useful, e.g.: # pwd /sys/block/etherd!e1.0 # cat debug rttavg: 58042 rttdev: 58472 nskbpool: 2 kicked: 868170 maxbcnt: 8704 ref: 0 falloc: 80 ffree: ffff88005c36ad80 003048b96515:1:8:8 ssthresh:4 lost:4133967 taint:0 r:1859180367 w:643159042 eth5 falloc: 82 ffree: ffff8800630d9a80 003048b96514:3:8:8 ssthresh:4 lost:4122252 taint:0 r:1863836699 w:655038947 eth4 The driver source code is small and easy to read, and explains what each of these measurements mean. (In this example we have a pair of 1GB links splitting the load. We've reached ~180MB/s on sequential operations. Our aoe driver is v7.5, current back at the time.) -Jeff ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk _______________________________________________ Aoetools-discuss mailing list Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx https://lists.sourceforge.net/lists/listinfo/aoetools-discuss