Re: [RFC PATCH 00/14] Per-sb tracking of dirty inodes

Jan Kara <jack@xxxxxxx> · Tue, 5 Aug 2014 12:31:51 +0200



On Tue 05-08-14 15:22:17, Dave Chinner wrote:
> On Fri, Aug 01, 2014 at 12:00:39AM +0200, Jan Kara wrote:
> >   Hello,
> > 
> >   here is my attempt to implement per superblock tracking of dirty inodes.
> > I have two motivations for this:
> >   1) I've tried to get rid of overwriting of inode's dirty time stamp during
> >      writeback and filtering of dirty inodes by superblock makes this
> >      significantly harder. For similar reasons also improving scalability
> >      of inode dirty tracking is more complicated than it has to be.
> >   2) Filesystems like Tux3 (but to some extent also XFS) would like to
> >      influence order in which inodes are written back. Currently this isn't
> >      possible. Tracking dirty inodes per superblock makes it easy to later
> >      implement filesystem callback for writing back inodes and also possibly
> >      allow filesystems to implement their own dirty tracking if they desire.
> > 
> >   The patches pass xfstests run and also some sync livelock avoidance tests
> > I have with 4 filesystems on 2 disks so they should be reasonably sound.
> > Before I go and base more work on this I'd like to hear some feedback about
> > whether people find this sane and workable.
> > 
> > After this patch set it is trivial to provide a per-sb callback for writeback
> > (at level of writeback_inodes()). It is also fairly easy to allow filesystem to
> > completely override dirty tracking (only needs some restructuring of
> > mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3
> > guys once the general approach in this patch set is acked. Or if there are
> > some in-tree users (XFS?, btrfs?) I can include them in the patch set.
> > 
> > Any comments welcome!
> 
> My initial performance tests haven't shown any regressions, but
> those same tests show that we still need to add plugging to
> writeback_inodes(). Patch with numbers below. I haven't done any
> sanity testing yet - I'll do that over the next few days...
  Thanks for tests! I was concentrating on no-regression part first with
adding possible performance improvements on top of that. I have added your
patch with plugging to the series. Thanks for that.

> FWIW, the patch set doesn't solve the sync lock contention problems -
> populate all of memory with a millions of inodes on a mounted
> filesystem, then run xfs/297 on a different filesystem. The system
> will trigger major contention in sync_inodes_sb() and
> inode_sb_list_add() on the inode_sb_list_lock because xfs/297 will
> cause lots of concurrent sync() calls to occur. The system will
> perform really badly on anything filesystem related while this
> contention occurs. Normally xfs/297 runs in 36s on the machine I
> just ran this test on, with the extra cached inodes it's been
> running for 15 minutes burning 8-9 CPU cores and there's no end in
> sight....
  Yes, I didn't mean to address this yet. When I was last looking into this
problem, redirty_tail() logic was really making handling of dirty & under
writeback inodes difficult (I didn't want to add another list_head to
struct inode for completely separate under-writeback list). So I deferred
this until redirty_tail() gets sorted out. But maybe I should revisit this
with the per-sb dirty tracking unless you beat me to it ;).

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html