* George Spelvin <linux@xxxxxxxxxxx> wrote: > First, an actual, albeit minor, bug: initializing both vmap_info_gen > and vmap_info_cache_gen to 0 marks the cache as valid, which it's not. Ha! :-) Fixed. > vmap_info_gen should be initialized to 1 to force an initial > cache update. Yeah. > Second, I don't see why you need a 64-bit counter. Seqlocks consider > 32 bits (31 bits, actually, the lsbit means "update in progress") quite > a strong enough guarantee. Just out of general paranoia - but you are right, and this would lower the overhead on 32-bit SMP platforms a bit, plus it avoids 64-bit word tearing artifacts on 32 bit platforms as well. I modified it to u32. > Third, it seems as though vmap_info_cache_gen is basically a duplicate > of vmap_info_lock.sequence. It should be possible to make one variable > serve both purposes. Correct, I alluded to that in my description: > > Note that there's an even simpler variant possible I think: we could use just > > the two generation counters and barriers to remove the seqlock. > You just need a kludge to handle the case of multiple vamp_info updates > between cache updates. > > There are two simple ones: > > 1) Avoid bumping vmap_info_gen unnecessarily. In vmap_unlock(), do > vmap_info_gen = (vmap_info_lock.sequence | 1) + 1; > 2) - Make vmap_info_gen a seqcount_t > - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen) > - In get_vmalloc_info, inside the seqlock critical section, do > vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1; > (Using the vmap_info_gen.sequence read while validating the > cache in the first place.) > > I should try to write an actual patch illustrating this. So I think something like the patch below is even simpler than trying to kludge generation counter semantics into seqcounts. I used two generation counters and a spinlock. The fast path is completely lockless and lightweight on modern SMP platforms. (where smp_rmb() is a no-op or very cheap.) There's not even a seqlock retry loop, instead an invalid cache causes us to fall back to the old behavior - and the freshest result is guaranteed to end up in the cache. The linecount got a bit larger: but half of it is comments. Note that the generation counters are signed integers so that this comparison can be done: + if (gen-vmap_info_cache_gen > 0) { Thanks, Ingo ======================> >From 1a4c168a22cc302282cbd1bf503ecfc4dc52b74f Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@xxxxxxxxxx> Date: Sat, 22 Aug 2015 12:28:01 +0200 Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info Linus reported that for scripting-intense workloads such as the Git build, glibc's qsort will read /proc/meminfo for every process created (by way of get_phys_pages()), which causes the Git build to generate a surprising amount of kernel overhead. A fair chunk of the overhead is due to get_vmalloc_info() - which walks a potentially long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Reported-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: linux-mm@xxxxxxxxx Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> --- mm/vmalloc.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 80 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 605138083880..23df06ebb48a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREEING 0x02 #define VM_VM_AREA 0x04 -static DEFINE_SPINLOCK(vmap_area_lock); +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); + +#ifdef CONFIG_PROC_FS +/* + * A seqlock and two generation counters for a simple cache of the + * vmalloc allocation statistics info printed in /proc/meminfo. + * + * ( The assumption of the optimization is that it's read frequently, but + * modified infrequently. ) + */ +static DEFINE_SPINLOCK(vmap_info_lock); +static int vmap_info_gen = 1; +static int vmap_info_cache_gen; +static struct vmalloc_info vmap_info_cache; +#endif static inline void vmap_lock(void) { @@ -285,6 +299,9 @@ static inline void vmap_lock(void) static inline void vmap_unlock(void) { +#ifdef CONFIG_PROC_FS + WRITE_ONCE(vmap_info_gen, vmap_info_gen+1); +#endif spin_unlock(&vmap_area_lock); } @@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void) } module_init(proc_vmalloc_init); -void get_vmalloc_info(struct vmalloc_info *vmi) +static void calc_vmalloc_info(struct vmalloc_info *vmi) { struct vmap_area *va; unsigned long free_area_size; @@ -2746,5 +2763,65 @@ void get_vmalloc_info(struct vmalloc_info *vmi) out: rcu_read_unlock(); } -#endif +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + int gen = READ_ONCE(vmap_info_gen); + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (READ_ONCE(vmap_info_cache_gen) == gen) { + int gen_after; + + /* + * The two read barriers make sure that we read + * 'gen', 'vmap_info_cache' and 'gen_after' in + * precisely that order: + */ + smp_rmb(); + *vmi = vmap_info_cache; + + smp_rmb(); + gen_after = READ_ONCE(vmap_info_gen); + + /* The cache is still valid: */ + if (gen == gen_after) + return; + + /* Ok, the cache got invalidated just now, regenerate it */ + gen = gen_after; + } + + /* Make sure 'gen' is read before the vmalloc info */ + smp_rmb(); + + calc_vmalloc_info(vmi); + + /* + * All updates to vmap_info_cache_gen go through this spinlock, + * so when the cache got invalidated, we'll only mark it valid + * again if we first fully write the new vmap_info_cache. + * + * This ensures that partial results won't be used. + */ + spin_lock(&vmap_info_lock); + if (gen-vmap_info_cache_gen > 0) { + vmap_info_cache = *vmi; + /* + * Make sure the new cached data is visible before + * the generation counter update: + */ + smp_wmb(); + + WRITE_ONCE(vmap_info_cache_gen, gen); + } + spin_unlock(&vmap_info_lock); +} + +#endif /* CONFIG_PROC_FS */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>