[RFC, PATCH] Make memory reclaim from inodes and dentry cache more scalable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The following patch detects when inodes and dentries cache are really
low in free entries, and skip reclamation of memory from them when it is
futile to do so.  We only resume reclaiming memory from inodes and
dentries cache when we have a reasonable amount of memory there. 
This avoided us bottlenecking on sb_lock to do useless memory
reclamation.  

I assume that it is okay to check super block's number of free objects
content without sb_lock as we are holding shrinker list's read lock. The
shrinker is still registered so super block is not yet deactivated which
requires shrinker un-registration.  It will be great if Al can help to
comment on whether this assumption is okay.

In a test scenario where page cache is putting heavy pressure on memory
usage with large number of processes, we saw very heavy contention on
the sb_lock to get free pages as seen in the following profile. The
patch helped to reduce the runtime by almost a factor of 4.

    62.81%               cp  [kernel.kallsyms]           [k] _raw_spin_lock
                         |
                         --- _raw_spin_lock
                            |
                            |--45.19%-- grab_super_passive
                            |          prune_super
                            |          shrink_slab
                            |          do_try_to_free_pages
                            |          try_to_free_pages
                            |          __alloc_pages_nodemask
                            |          alloc_pages_current


Tim

Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
---
diff --git a/fs/super.c b/fs/super.c
index 8760fe1..e91c7506 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -38,6 +38,9 @@
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
+int	sb_cache_himark = 100;
+int	sb_cache_lowmark = 5;
+
 /*
  * One thing we have to be careful of with a per-sb shrinker is that we don't
  * drop the last active reference to the superblock from within the shrinker.
@@ -60,6 +63,20 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc)
 	if (sc->nr_to_scan && !(sc->gfp_mask & __GFP_FS))
 		return -1;
 
+	/* Don't do useless reclaim unless we have reasonable amount
+	 * of free objects to avoid sb_lock contention.
+	 * Should be okay to reference sb content without sb_lock as we are
+	 * holding shrinker list's read lock, which means shrinker is still
+	 * registered. So sb is not yet deactivated which requires shrinker
+	 * un-registration.
+	 */
+	if (sb->cache_low) {
+		total_objects = sb->s_nr_dentry_unused +
+				sb->s_nr_inodes_unused + fs_objects;
+		if (total_objects < sb_cache_himark)
+			return 0;
+	}
+
 	if (!grab_super_passive(sb))
 		return !sc->nr_to_scan ? 0 : -1;
 
@@ -69,6 +86,9 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc)
 	total_objects = sb->s_nr_dentry_unused +
 			sb->s_nr_inodes_unused + fs_objects + 1;
 
+	if (!sb->cache_low && total_objects <= sb_cache_lowmark)
+		sb->cache_low = 1;
+
 	if (sc->nr_to_scan) {
 		int	dentries;
 		int	inodes;
@@ -96,6 +116,9 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc)
 				sb->s_nr_inodes_unused + fs_objects;
 	}
 
+	if (sb->cache_low && total_objects > sb_cache_himark)
+		sb->cache_low = 0;
+
 	total_objects = (total_objects / 100) * sysctl_vfs_cache_pressure;
 	drop_super(sb);
 	return total_objects;
@@ -184,6 +207,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		s->s_shrink.seeks = DEFAULT_SEEKS;
 		s->s_shrink.shrink = prune_super;
 		s->s_shrink.batch = 1024;
+		s->cache_low = 0;
 	}
 out:
 	return s;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 386da09..c0465e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1496,6 +1496,7 @@ struct super_block {
 
 	/* Being remounted read-only */
 	int s_readonly_remount;
+	int cache_low;
 };
 
 /* superblock cache pruning functions */


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux