From: Dave Chinner <dchinner@xxxxxxxxxx> When the inode cache shrinker runs, we may have lots of dirty inodes queued up in the VFS dirty queues that have not been expired. The typical case for this with XFS is atime updates. The result is that a highly concurrent workload that copies files and then later reads them (say to verify checksums) dirties all the inodes again, even when relatime is used. In a constrained memory environment, this results in a large number of dirty inodes using all of available memory and memory reclaim being unable to free them as dirty inodes areconsidered active. This problem was uncovered by Chris Mason during recent low memory stress testing. The fix is to trigger VFS level writeback from the XFS inode cache shrinker if there isn't already writeback in progress. This ensures that when we enter a low memory situation we start cleaning inodes (via the flusher thread) on the filesystem immediately, thereby making it more likely that we will be able to evict those dirty inodes from the VFS in the near future. The mechanism is not perfect - it only acts on the current filesystem, so if all the dirty inodes are on a different filesystem it won't help. However, it seems to be a valid assumption is that the filesystem with lots of dirty inodes is going to have the shrinker called very soon after the memory shortage begins, so this shouldn't be an issue. The other flaw is that there is no guarantee that the flusher thread will make progress fast enough to clean the dirty inodes so they can be reclaimed in the near future. However, this mechanism does improve the resilience of the filesystem under the test conditions - instead of reliably triggering the OOM killer 20 minutes into the stress test, it took more than 6 hours before it happened. This small addition definitely improves the low memory resilience of XFS on this type of workload, and best of all it has no impact on performance when memory is not constrained. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/linux-2.6/xfs_sync.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c index 9ad9560..c240d46 100644 --- a/fs/xfs/linux-2.6/xfs_sync.c +++ b/fs/xfs/linux-2.6/xfs_sync.c @@ -1038,6 +1038,17 @@ xfs_reclaim_inode_shrink( if (!(gfp_mask & __GFP_FS)) return -1; + /* + * make sure VFS is cleaning inodes so they can be pruned + * and marked for reclaim in the XFS inode cache. If we don't + * do this the VFS can accumulate dirty inodes and we can OOM + * before they are cleaned by the periodic VFS writeback. + * + * This takes VFS level locks, so we can only do this after + * the __GFP_FS checks otherwise lockdep gets really unhappy. + */ + writeback_inodes_sb_nr_if_idle(mp->m_super, nr_to_scan); + xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); /* terminate if we don't exhaust the scan */ -- 1.7.2.3 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html