Re: Poor performance on 10 Gbps SAN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My .02 below...

-----Original Message-----
From: Ed Cashin [mailto:ecashin@xxxxxxxxxx] 
Sent: Friday, September 6, 2013 10:11 AM
To: Derick Swanepoel
Cc: aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx
Subject: Re:  Poor performance on 10 Gbps SAN

> That said, while checking the vblade README for the design goals, I noticed that it advertises
> a capacity for 16 outstanding commands.  If you want to try some tuning, you could adjust
> Bufcount in dat.h and then make sure your settings in /proc are sufficient to allow the kernel
> to buffer 16 writes.  (Read commands are small.)

Back when I was tuning AoE for our virtualization project I learned the hard way to be careful with the aoe_maxout parameter of the Linux driver.  E.g. if you have a single target and single initiator, 16 may be appropriate, but if you have 4 hosts sharing one target, you can quickly overrun the command buffers if all hosts are doing I/O at once.  I settled on aoe_maxout="8" as a compromise for stability and performance.

Long command queues can wreak havoc with the Linux aoe driver RTT calculations too, leading to unnecessary retransmits (it's inevitable that average round-trip times go up as the queue length grows past the point of your array's ability to perform I/O in parallel).  Retransmits will of course lower your throughput and decrease the efficiency of your network.  With hardware flow control, we found that very few packets are completely lost, if ever.  The most likely scenario for losing commands is to send more to the target than it can queue at once, such as by overrunning the kernel socket buffers.

As you're testing the stack, pay close attention to network statistics as well as block statistics.  I also found the "debug" output of the aoe driver useful, e.g.:

# pwd
/sys/block/etherd!e1.0

# cat debug
rttavg: 58042 rttdev: 58472
nskbpool: 2
kicked: 868170
maxbcnt: 8704
ref: 0
falloc: 80
ffree: ffff88005c36ad80
003048b96515:1:8:8
        ssthresh:4
        lost:4133967
        taint:0
        r:1859180367
        w:643159042
        eth5
falloc: 82
ffree: ffff8800630d9a80
003048b96514:3:8:8
        ssthresh:4
        lost:4122252
        taint:0
        r:1863836699
        w:655038947
        eth4

The driver source code is small and easy to read, and explains what each of these measurements mean.  (In this example we have a pair of 1GB links splitting the load.  We've reached ~180MB/s on sequential operations.  Our aoe driver is v7.5, current back at the time.)

-Jeff



------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Aoetools-discuss mailing list
Aoetools-discuss@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss




[Index of Archives]     [Linux ARM Kernel]     [Linux SCSI]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux