Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes: > On Tue, 05 Jul 2011 16:35:21 -0400 > Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: > >> Hi, >> >> When looking at perf data for my aio-stress test runs against some >> relatively fast storage, I noticed that __aio_get_req was taking up more >> CPU than I expected. The command line for aio-stress was: >> aio-stress -O -r 8 -d 256 -b 32 -l -s 16384 /dev/sdk >> which means we're sending 32 iocbs to io_submit at a time. Here's the >> perf report -g output: >> >> Events: 41K cycles >> - 7.33% aio-stress [kernel.kallsyms] [k] _raw_spin_lock_irq >> - _raw_spin_lock_irq >> + 75.38% scsi_request_fn >> - 10.67% __aio_get_req >> io_submit_one >> >> The spinlock is the ctx_lock, taken when allocating the request. This >> seemed like pretty low-hanging fruit to me, so I cooked up a patch to >> allocate 32 kiocbs at a time. This is the result: >> >> Events: 50K cycles >> - 5.54% aio-stress [kernel.kallsyms] [k] _raw_spin_lock_irq >> - _raw_spin_lock_irq >> + 83.23% scsi_request_fn >> - 5.14% io_submit_one >> >> So, aio-stress now takes up ~2% less CPU. This seems like a worth-while >> thing to do, but I don't have a system handy that would show any real >> performance gains from it. The higher the number of iocbs passed to >> io_submit, the more of a win this will be. >> >> FWIW, I tested this with multiple different batch sizes (1, 32, 33, 64, >> 256) and it didn't blow up. >> > > So the benefit comes from reducing the frequency with which ctx_lock is > taken, rather than from reducing its per-acquisition hold times. That's right. The lock is taken in the submission and completion paths. >> ... >> >> +#define KIOCB_STASH_SIZE 32 >> +struct kiocb_stash { >> + unsigned head; >> + unsigned tail; >> + unsigned total; >> + unsigned used; >> + struct kiocb *kiocbs[KIOCB_STASH_SIZE]; >> +}; > > Laziness begets laziness: you didn't bother documenting how this ring > thing works and why it exists, so I didn't bother reverse-engineering > that infomation. Sorry. I used a ring as I didn't want to introduce more allocations in the submission path (though I have no real data on which to base this premature optimization). As for documenting how it works, I thought it was pretty straight-forward. > Once all this has been fixed, I might be wondering why it was done with > a yukky fixed-size array rather than with a list. I could change it to a list, if that's preferred. Remember, this was an RFC. > A pox on you for telling us there's a "potential starvation case" but > not telling us what it is! It's a kmalloc failure - why is this > described as "starvation"? Perhaps you missed that I simply moved that comment in the code; I didn't add it. When allocating an I/O context, the user supplies a maximum number of events. That number is used to size the ring buffer that holds completion data. It is possible that there is a completed request waiting for a final fput that is keeping a new allocation from succeeding. Calling the aio_fput_routine can free that up, allowing the allocation to succeed. > Once all this has been fixed, I might be wondering how we test the > "potential starvation case". Well, probably a close on the fd prior to event completion, coupled with submitting new I/O (on a different fd) when the event completion is delivered to userspace but before the fput on the fd. > Various trivial fixies: > > > From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > - coding-style fixes > > - improve variable name(s) > > - fix typo(s) > > - use new kmap_atomic() interface > > - change comment to use the number of columns God gave us > > - use "bool" for a boolean > > Cc: Jeff Moyer <jmoyer@xxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> This was an RFC. I'm happy to have it merged, but I would have been equally happy to have addressed your review comments first. In the end, whatever works best for you is fine with me. Thanks for taking time to look at it! Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html