On Mon 24-11-14 23:33:35, Ted Tso wrote: > On Tue, Nov 25, 2014 at 12:52:39PM +1100, Dave Chinner wrote: > > > +static void flush_sb_dirty_time(struct super_block *sb) > > > +{ > ... > > > +} > > > > This just seems wrong to me, not to mention extremely expensive when we have > > millions of cached inodes on the superblock. > > #1, It only gets called on a sync(2) or syncfs(2), which can be pretty > expensive as it is, so I didn't really worry about it. > > #2, We're already iterating over all of the inodes in the sync(2) or > syncfs(2) path, so the code path in question is already O(N) in the > number of inodes. > > > Why can't we just add a function like mark_inode_dirty_time() which > > puts the inode on the dirty inode list with a writeback time 24 > > hours in the future rather than 30s in the future? > > I was concerned about putting them on the dirty inode list because it > would be extra inodes for the writeback threads would have to skip > over and ignore (since they would not be dirty in the inde or data > pages sense). I agree this isn't going to work easily. Currently flusher relies on dirty list being sorted by i_dirtied_when and that gets harder to maintain if we ever have inodes with i_dirtied_when in future (we'd have to sort-in newly dirtied inodes). > Another solution would be to use a separate linked list for dirtytime > inodes, but that means adding some extra fields to the inode > structure, which some might view as bloat. Given #1 and #2 above, > yes, we're doubling the CPU cost for sync(2) and syncfs(2), since > we're not iterating over all of the inodes twice, but I believe that > is a better trade-off than bloating the inode structure with an extra > set of linked lists or increasing the CPU cost to the writeback path > (which gets executed way more often than the sync or syncfs paths). This would be possible and as Boaz says, it might be possible to reuse the same list_head in the inode for this. Getting rid of the full scan of all superblock inodes would be nice (as the scan gets really expensive for large numbers of inodes (think of i_sb_lock contention) and this makes it twice as bad) so I'd prefer to do this if possible. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs