* George Spelvin <linux@xxxxxxxxxxx> wrote: > Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > I think this is too complex. > > > > How about something simple like the patch below (on top of the third patch)? > > > It makes the vmalloc info transactional - /proc/meminfo will always print a > > consistent set of numbers. (Not that we really care about races there, but it > > looks really simple to solve so why not.) > > Looks like a huge simplification! > > It needs a comment about the approximate nature of the locking and > the obvious race conditions: > 1) The first caller to get_vmalloc_info() clears vmap_info_changed > before updating vmap_info_cache, so a second caller is likely to > get stale data for the duration of a calc_vmalloc_info call. > 2) Although unlikely, it's possible for two threads to race calling > calc_vmalloc_info, and the one that computes fresher data updates > the cache first, so the later write leaves stale data. > > Other issues: > 3) Me, I'd make vmap_info_changed a bool, for documentation more than > any space saving. > 4) I wish there were a trylock version of write_seqlock, so we could > avoid blocking entirely. (You *could* hand-roll it, but that eats > into the simplicity.) Ok, fair enough - so how about the attached approach instead, which uses a 64-bit generation counter to track changes to the vmalloc state. This is still very simple, but should not suffer from stale data being returned indefinitely in /proc/meminfo. We might race - but that was true before as well due to the lock-less RCU list walk - but we'll always return a correct and consistent version of the information. Lightly tested. This is a replacement patch to make it easier to read via email. I also made sure there's no extra overhead in the !CONFIG_PROC_FS case. Note that there's an even simpler variant possible I think: we could use just the two generation counters and barriers to remove the seqlock. Thanks, Ingo ==============================> >From f9fd770e75e2edb4143f32ced0b53d7a77969c94 Mon Sep 17 00:00:00 2001 From: Ingo Molnar <mingo@xxxxxxxxxx> Date: Sat, 22 Aug 2015 12:28:01 +0200 Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info Linus reported that glibc (rather stupidly) reads /proc/meminfo for every sysinfo() call, which causes the Git build to use a surprising amount of CPU time, mostly due to the overhead of get_vmalloc_info() - which walks a long list to do its statistics. Modify Linus's jiffies based patch to use generation counters to cache the vmalloc info: vmap_unlock() increases the generation counter, and the get_vmalloc_info() reads it and compares it against a cached generation counter. Also use a seqlock to make sure we always print a consistent set of vmalloc statistics. Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: linux-mm@xxxxxxxxx Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> --- mm/vmalloc.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 56 insertions(+), 3 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 605138083880..d72b23436906 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn); #define VM_LAZY_FREEING 0x02 #define VM_VM_AREA 0x04 -static DEFINE_SPINLOCK(vmap_area_lock); +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); + +#ifdef CONFIG_PROC_FS +/* + * A seqlock and two generation counters for a simple cache of the + * vmalloc allocation statistics info printed in /proc/meminfo. + * + * ( The assumption of the optimization is that it's read frequently, but + * modified infrequently. ) + */ +static DEFINE_SEQLOCK(vmap_info_lock); +static u64 vmap_info_gen; +static u64 vmap_info_cache_gen; +static struct vmalloc_info vmap_info_cache; +#endif static inline void vmap_lock(void) { @@ -285,6 +299,9 @@ static inline void vmap_lock(void) static inline void vmap_unlock(void) { +#ifdef CONFIG_PROC_FS + WRITE_ONCE(vmap_info_gen, vmap_info_gen+1); +#endif spin_unlock(&vmap_area_lock); } @@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void) } module_init(proc_vmalloc_init); -void get_vmalloc_info(struct vmalloc_info *vmi) +static void calc_vmalloc_info(struct vmalloc_info *vmi) { struct vmap_area *va; unsigned long free_area_size; @@ -2746,5 +2763,41 @@ void get_vmalloc_info(struct vmalloc_info *vmi) out: rcu_read_unlock(); } -#endif +/* + * Return a consistent snapshot of the current vmalloc allocation + * statistics, for /proc/meminfo: + */ +void get_vmalloc_info(struct vmalloc_info *vmi) +{ + u64 gen = READ_ONCE(vmap_info_gen); + + /* + * If the generation counter of the cache matches that of + * the vmalloc generation counter then return the cache: + */ + if (READ_ONCE(vmap_info_cache_gen) == gen) { + unsigned int seq; + + do { + seq = read_seqbegin(&vmap_info_lock); + *vmi = vmap_info_cache; + } while (read_seqretry(&vmap_info_lock, seq)); + + return; + } + + calc_vmalloc_info(vmi); + + /* + * If are racing with a new vmalloc() then we might write + * the old generation counter here - and the next call to + * get_vmalloc_info() will fix things up: + */ + write_seqlock(&vmap_info_lock); + vmap_info_cache = *vmi; + WRITE_ONCE(vmap_info_cache_gen, gen); + write_sequnlock(&vmap_info_lock); +} + +#endif /* CONFIG_PROC_FS */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>