On Tue 05-08-14 15:22:17, Dave Chinner wrote: > On Fri, Aug 01, 2014 at 12:00:39AM +0200, Jan Kara wrote: > > Hello, > > > > here is my attempt to implement per superblock tracking of dirty inodes. > > I have two motivations for this: > > 1) I've tried to get rid of overwriting of inode's dirty time stamp during > > writeback and filtering of dirty inodes by superblock makes this > > significantly harder. For similar reasons also improving scalability > > of inode dirty tracking is more complicated than it has to be. > > 2) Filesystems like Tux3 (but to some extent also XFS) would like to > > influence order in which inodes are written back. Currently this isn't > > possible. Tracking dirty inodes per superblock makes it easy to later > > implement filesystem callback for writing back inodes and also possibly > > allow filesystems to implement their own dirty tracking if they desire. > > > > The patches pass xfstests run and also some sync livelock avoidance tests > > I have with 4 filesystems on 2 disks so they should be reasonably sound. > > Before I go and base more work on this I'd like to hear some feedback about > > whether people find this sane and workable. > > > > After this patch set it is trivial to provide a per-sb callback for writeback > > (at level of writeback_inodes()). It is also fairly easy to allow filesystem to > > completely override dirty tracking (only needs some restructuring of > > mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3 > > guys once the general approach in this patch set is acked. Or if there are > > some in-tree users (XFS?, btrfs?) I can include them in the patch set. > > > > Any comments welcome! > > My initial performance tests haven't shown any regressions, but > those same tests show that we still need to add plugging to > writeback_inodes(). Patch with numbers below. I haven't done any > sanity testing yet - I'll do that over the next few days... Thanks for tests! I was concentrating on no-regression part first with adding possible performance improvements on top of that. I have added your patch with plugging to the series. Thanks for that. > FWIW, the patch set doesn't solve the sync lock contention problems - > populate all of memory with a millions of inodes on a mounted > filesystem, then run xfs/297 on a different filesystem. The system > will trigger major contention in sync_inodes_sb() and > inode_sb_list_add() on the inode_sb_list_lock because xfs/297 will > cause lots of concurrent sync() calls to occur. The system will > perform really badly on anything filesystem related while this > contention occurs. Normally xfs/297 runs in 36s on the machine I > just ran this test on, with the extra cached inodes it's been > running for 15 minutes burning 8-9 CPU cores and there's no end in > sight.... Yes, I didn't mean to address this yet. When I was last looking into this problem, redirty_tail() logic was really making handling of dirty & under writeback inodes difficult (I didn't want to add another list_head to struct inode for completely separate under-writeback list). So I deferred this until redirty_tail() gets sorted out. But maybe I should revisit this with the per-sb dirty tracking unless you beat me to it ;). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html