On Thu, 2011-04-14 at 20:49 -0500, Dave Chinner wrote: > On Thu, Apr 14, 2011 at 02:29:34PM -0500, Alex Elder wrote: > > On Tue, 2011-04-12 at 23:52 +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > We currently have significant issues with the amount of stack that > > > allocation in XFS uses, especially in the writeback path. We can > > > easily consume 4k of stack between mapping the page, manipulating > > > the bmap btree and allocating blocks from the free list. Not to > > > mention btree block readahead and other functionality that issues IO > > > in the allocation path. > > > > > > As a result, we can no longer fit allocation in the writeback path > > > in the stack space provided on x86_64. To alleviate this problem, > > > introduce an allocation workqueue and move all allocations to a > > > seperate context. This can be easily added as an interposing layer > > > into xfs_alloc_vextent(), which takes a single argument structure > > > and does not return until the allocation is complete or has failed. > > > > > > To do this, add a work structure and a completion to the allocation > > > args structure. This allows xfs_alloc_vextent to queue the args onto > > > the workqueue and wait for it to be completed by the worker. This > > > can be done completely transparently to the caller. > > > > > > The worker function needs to ensure that it sets and clears the > > > PF_MEMALLOC flag appropriately as it is being run in an active > > > tranÑaction context. Work can also be queued in a memory reclaim > > > context, so a rescuer is needed for the workqueue. > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Interesting. > > > > I guess I don't see anything inherently wrong with this. > > At first glance it seems like workqueue abuse. But it's > > better than rolling our own daemon to do the same thing. > > (There's nothing to stop you from doing this generically > > either...) > > > > Will it shift some accounting of CPU time from user to > > system? > > Some, yes. For delayed allocation in the background flush or sync > path, there is no difference because that isn't accounted to the > user anyway. The only cases it will make a difference are for > foreground writeback (e.g. fsync or write throttling), and for inode > and directory allocation. Linus made a comment relevant to this yesterday, but in reference to RCU making use of kernel threads: Re: REGRESSION: Performance regressions from switching anon_vma->lock to mutex So using anonymous kernel threads is actually a real downside. It makes it much less obvious what is going on. We saw that exact same thing with the generic worker thread conversions: things that used to have clear performance issues ("oh, the iwl-phy0 thread is using 3% of CPU time because it is polling for IO, and I can see that in 'top'") turned into much-harder-to-see issues ("oh, kwork0 us using 3% CPU time according to 'top' - I have no idea why"). Maybe if there were a way to reuse the generic worker infrastructure but have it associated with a kernel module or something. Then again that is sort of contrary to the purpose of the generic centrally-managed pool. Maybe if generic work could somehow get tagged with its real owner or origin it would be better, but I really have no idea how one would go about that. In any case, I think we need to be cautious about moving in this direction. -Alex > > The code looks OK to me. The idea of it makes me pause > > a little, though I don't object. > > It's definitely a different way of thinking, but I'm seriously > starting to consider pushing more operations into asychronous > background tasks that the caller can wait on if they so desire. > e.g. EOF truncation during xfs_inactive and xfs_release(), flushing > a range of a file, inode cluster freeing, CIL flushes, etc. > > We've got multiple cores in most machines these days (even the low > end is going multi-core), context switch overhead is pretty low and > we've got concurrency managed work queues so we don't need thread > pools - we should take advantage of what they provide.... > > Cheers, > > Dave. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs