On Tue, Jul 8, 2014 at 6:43 PM, Laura Abbott <lauraa@xxxxxxxxxxxxxx> wrote: > > Hi, > > I have an arm64 target which has been observed hanging in __purge_vmap_area_lazy > in vmalloc.c The root cause of this 'hang' is that flush_tlb_kernel_range is > attempting to flush 255GB of virtual address space. This takes ~2 seconds and > preemption is disabled at this time thanks to the purge lock. Disabling > preemption for that time is long enough to trigger a watchdog we have setup. > > Triggering this is fairly easy: > 1) Early in bootup, vmalloc > lazy_max_pages. This gives an address near the > start of the vmalloc range. > 2) load a module > 3) vfree the vmalloc region from step 1 > 4) unload the module > > The arm64 virtual address layout looks like > vmalloc : 0xffffff8000000000 - 0xffffffbbffff0000 (245759 MB) > vmemmap : 0xffffffbc02400000 - 0xffffffbc03600000 ( 18 MB) > modules : 0xffffffbffc000000 - 0xffffffc000000000 ( 64 MB) > > and the algorithm in __purge_vmap_area_lazy flushes between the lowest address. > Essentially, if we are using a reasonable amount of vmalloc space and a module > unload triggers a vmalloc purge, we will end up triggering our watchdog. > > A couple of options I thought of: > 1) Increase the timeout of our watchdog to allow the flush to occur. Nobody > I suggested this to likes the idea as the watchdog firing generally catches > behavior that results in poor system performance and disabling preemption > for that long does seem like a problem. > 2) Change __purge_vmap_area_lazy to do less work under a spinlock. This would > certainly have a performance impact and I don't even know if it is plausible. > 3) Allow module unloading to trigger a vmalloc purge beforehand to help avoid > this case. This would still be racy if another vfree came in during the time > between the purge and the vfree but it might be good enough. > 4) Add 'if size > threshold flush entire tlb' (I haven't profiled this yet) We have the same problem. I'd agree with point 2 and point 4, point 1/3 do not actually fix this issue. purge_vmap_area_lazy() could be called in other cases. w.r.t the threshold to flush entire tlb instead of doing that page-by-page, that could be different from platform to platform. And considering the cost of tlb flush on x86, I wonder why this isn't an issue on x86. The whole __purge_vmap_area_lazy() is protected by a single spinlock, I see no reason why a mutex cannot be used there, this allows preemption during this likely lengthy process. The rbtree removal seems to be heavy too - worst case would be to call __free_vmap_area() for lazy_max_pages times. And they are all protected by a single spinlock for the whole traversal, which is not necessary. CC+ Russell, Catalin, Will. We have a patch as below: ============================ >8 ========================= >From cabc0fc0bdeee3d7824660b17f25ab465e3cc76b Mon Sep 17 00:00:00 2001 From: Eric Miao <emiao@xxxxxxxxxx> Date: Mon, 7 Jul 2014 18:19:09 -0700 Subject: [PATCH 2/2] mm: vmalloc: allow preemption in __purge_vmap_area_lazy() Signed-off-by: Eric Miao <emiao@xxxxxxxxxx> Signed-off-by: Richard Wiley <rwiley@xxxxxxxxxx> --- mm/vmalloc.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 35b19d4..35a2d36 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -597,7 +597,7 @@ void set_iounmap_nonlazy(void) static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end, int sync, int force_flush) { - static DEFINE_SPINLOCK(purge_lock); + static DEFINE_MUTEX(purge_lock); LIST_HEAD(valist); struct vmap_area *va; struct vmap_area *n_va; @@ -609,10 +609,10 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end, * the case that isn't actually used at the moment anyway. */ if (!sync && !force_flush) { - if (!spin_trylock(&purge_lock)) + if (!mutex_trylock(&purge_lock)) return; } else - spin_lock(&purge_lock); + mutex_lock(&purge_lock); if (sync) purge_fragmented_blocks_allcpus(); @@ -644,12 +644,13 @@ static void __purge_vmap_area_lazy(unsigned long *start, unsigned long *end, } if (nr) { - spin_lock(&vmap_area_lock); - list_for_each_entry_safe(va, n_va, &valist, purge_list) + list_for_each_entry_safe(va, n_va, &valist, purge_list) { + spin_lock(&vmap_area_lock); __free_vmap_area(va); - spin_unlock(&vmap_area_lock); + spin_unlock(&vmap_area_lock); + } } - spin_unlock(&purge_lock); + mutex_unlock(&purge_lock); } /* -- 1.8.1.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>