Re: [LSF/MM TOPIC][ATTEND] Improving async io, specifically io_submit latencies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(cc'ing Li and cgroups ML)

Hey, guys.

On Fri, Mar 01, 2013 at 10:03:47AM -0500, Jeff Moyer wrote:
> > This is even worse when a filesystem is involved and metadata operations
> > get stuck at the end of a huge queue.  By punting everything to
> > workqueue, all that's been accomplished is to hide the queueing and
> > shove it up a layer.
> 
> Slightly different issue there, but no need to hash it out in this
> thread.  One thing I do agree with is that, when you punt I/O to a
> workqueue, you lose the ability to account that I/O to the proper
> process.

Block layer now supports tagging bio's with the issuer's identity.

  int bio_associate_current(struct bio *bio);
  void bio_disassociate_task(struct bio *bio);

After bio_associate_current() is performed on a bio, block layer will
treat the bio as if it's being issued by the %current at the time of
association no matter which task ends up doing the actual submission
in terms of ioctx and blkcg.

Async IO handling of blkcg is still uttrely broken so it isn't as
useful at this point yet tho.

> > A similar problem exists with kernel memory usage, but it's even worse
> > there because most users aren't using memcg. If we're short on memery,
> > the processing doing aio really needs to be throttled in io_submit() ->
> > get_user_pages(); if it's punting everything to workqueue, now the other
> > processes may have to compete against 1000 worker threads calling
> > get_user_pages() simultaneously instead of just the process doing aio.
> 
> Right, this hits on the inability to track the i/o to the original
> submitting process.  I thought we had a plan to fix that (and I have
> some really old patches for this that I never quite finished).  Tejun?
> Jens?

For IO, I think bio tagging should be able to handle most of it,
eventually.  For memory, ultimately, we want the workqueue tasks to be
able to assume the resource role of the work item issuer.  Associating
dynamically is nasty given the variety of cgroups - e.g. there might
not be any common CPUs between the allowed sets to the workqueue and
the issuer, so I'm unsure whether we can reach a general solution;
however, workqueue is currently growing worker pools with custom
attributes which will eventually cover cgroup association and we can
use that for specific problem areas - ie. create a matching workqueue
for each aio context (the backed pool is shared so the overhead isn't
big).

One obstacle there is we currently don't have a way to say "this
workqueue belongs to this cgroup" as there is no "this" cgroup defined
(awesome design).  That part is being rectified but for the time being
we can probably say "this workqueue belongs to the same cgroups as
%current" which should be enough for aio contexts, I think.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux