Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies

Jeff Moyer <jmoyer@xxxxxxxxxx> · Fri, 01 Mar 2013 10:03:47 -0500

Kent Overstreet <koverstreet@xxxxxxxxxx> writes:

> On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote:
>> Hi,
>> 
>> I'm interested in discussing how to improve async io api in the kernel,
>> specifically io_submit latencies.
>> 
>> I am working on trying to make io_submit non-blocking. I had posted a
>> patch[1] for this earlier on fsdevel and there was some discussion on
>> it. I have made some of the improvements suggested there.
>> 
>> The approach attempted in that patch essentially tries to service the
>> requests on a separate kernel thread. It was pointed out that this would
>> need to ensure that there aren't any unknown task_struct references or
>> dependencies under f_op->aio* which might get confused because of the
>> kernel thread. Would this kinda full audit be enough or would be it
>> considered too fragile?
>
> Was just talking about this.  Completely agreed that we need to do
> something about it, but personally I don't think punting everything to
> workqueue is a realistic solution.
>
> One problem with the approach is that sometimes we _do_ need to block.
> The primary reason we block in submit_bio if the request queue is too
> full is that our current IO schedulers can't cope with unbounded queue
> depth; other processes will be starved and see unbounded IO latencies.

The I/O schedulers have no problem coping with a larger queue depth.  In
fact, the more I/O you let through to the scheduler, the better chance
you have of getting fairness between processes (not the other way around
as you suggest).  The sleeping on nr_requests is done to prevent the I/O
subsystem from eating up all of your kernel memory.

> This is even worse when a filesystem is involved and metadata operations
> get stuck at the end of a huge queue.  By punting everything to
> workqueue, all that's been accomplished is to hide the queueing and
> shove it up a layer.

Slightly different issue there, but no need to hash it out in this
thread.  One thing I do agree with is that, when you punt I/O to a
workqueue, you lose the ability to account that I/O to the proper
process.

> A similar problem exists with kernel memory usage, but it's even worse
> there because most users aren't using memcg. If we're short on memery,
> the processing doing aio really needs to be throttled in io_submit() ->
> get_user_pages(); if it's punting everything to workqueue, now the other
> processes may have to compete against 1000 worker threads calling
> get_user_pages() simultaneously instead of just the process doing aio.

Right, this hits on the inability to track the i/o to the original
submitting process.  I thought we had a plan to fix that (and I have
some really old patches for this that I never quite finished).  Tejun?
Jens?

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html