Kent Overstreet <koverstreet@xxxxxxxxxx> writes: > On Fri, Mar 01, 2013 at 01:37:55AM +0530, Ankit Jain wrote: >> Hi, >> >> I'm interested in discussing how to improve async io api in the kernel, >> specifically io_submit latencies. >> >> I am working on trying to make io_submit non-blocking. I had posted a >> patch[1] for this earlier on fsdevel and there was some discussion on >> it. I have made some of the improvements suggested there. >> >> The approach attempted in that patch essentially tries to service the >> requests on a separate kernel thread. It was pointed out that this would >> need to ensure that there aren't any unknown task_struct references or >> dependencies under f_op->aio* which might get confused because of the >> kernel thread. Would this kinda full audit be enough or would be it >> considered too fragile? > > Was just talking about this. Completely agreed that we need to do > something about it, but personally I don't think punting everything to > workqueue is a realistic solution. > > One problem with the approach is that sometimes we _do_ need to block. > The primary reason we block in submit_bio if the request queue is too > full is that our current IO schedulers can't cope with unbounded queue > depth; other processes will be starved and see unbounded IO latencies. The I/O schedulers have no problem coping with a larger queue depth. In fact, the more I/O you let through to the scheduler, the better chance you have of getting fairness between processes (not the other way around as you suggest). The sleeping on nr_requests is done to prevent the I/O subsystem from eating up all of your kernel memory. > This is even worse when a filesystem is involved and metadata operations > get stuck at the end of a huge queue. By punting everything to > workqueue, all that's been accomplished is to hide the queueing and > shove it up a layer. Slightly different issue there, but no need to hash it out in this thread. One thing I do agree with is that, when you punt I/O to a workqueue, you lose the ability to account that I/O to the proper process. > A similar problem exists with kernel memory usage, but it's even worse > there because most users aren't using memcg. If we're short on memery, > the processing doing aio really needs to be throttled in io_submit() -> > get_user_pages(); if it's punting everything to workqueue, now the other > processes may have to compete against 1000 worker threads calling > get_user_pages() simultaneously instead of just the process doing aio. Right, this hits on the inability to track the i/o to the original submitting process. I thought we had a plan to fix that (and I have some really old patches for this that I never quite finished). Tejun? Jens? Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html