Re: Tune IO stack to get max IOPS out of AIO

Jens Axboe <axboe@xxxxxxxxx> · Thu, 31 Jan 2013 12:02:35 +0100

On Wed, Jan 30 2013, Alireza Haghdoost wrote:
> Hello
> 
> I am trying to tune Linux IO stack to maximize application IOPS. I was
> wondering if there is any other parameter that I am missing to tune up
> ?
> 
> Right now I am using a raw block device to write sequential AIO
> requests with FIO and set :
> 1. max possible value for libaio completion queue,
> 2. max possible value for IO scheduler queue size
> (/sys/block/sda/queue/nr_requests)
> 3. max possible value for generic device driver queue depth
> (/sys/block/sda/device/queue_depth)
> 4. noop IO scheduler
> 6. disable IO merge ( echo 2 > /sys/block/sda/queueu/nomerge )
> 
> Note that the device (/dev/sda) is attached to the server over the
> network. Therefore, generic device driver will send IOs to Fiber
> Channel network driver (Qlogic qla2xxx).
> 
> Please kindly advise how can I push more IO per second ?

It's hard to answer this kind of question, since there are many things
that can be optimized. You really need a good understanding of the full
stack (app to device) and profiles/ideas on where the bottlenecks are.
On the fio side, you can relatively easily set batch parameters for
submitting and completing io - these are the iodepth_batch_submit= and
iodepth_batch_complete= settings. That will make fio submit and complete
multiple IOs at once, thus reducing the system overhead in doing that.
Now, whether that has an actual impact on your performance (or is even a
realistic thing to do, I'm assuming you are using fio to model what your
application would do), that's another question.

Your suggestions should help reduce the overhead too, though increasing
the queue depth beyond the existing 256 will usually not yield much of a
benefit. The basic hash merging is also very cheap, so unless you truly
only have random IO, it might be worth keeping.

You can also experiment with completion locations, that's the
rq_affinity setting in the same queue/ directory. A value of 1 will
migrate completions to/near the group that submitted the IO, a value of
2 will migrate it to the specific CPU. What makes sense depends on the
load of the CPU, and how costly the completion is...

Most good settings will have to be experimentally deduced. But the
better idea you have of where the problems are, the more you can
logically eliminate or prefer some settings.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html