Re: [PATCH] mm: Remove pointless might_sleep() in remove_vm_area().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tetsuo Handa wrote:
> . This patch will not break CONFIG_PREEMPT_COUNT=n case because
> in_interrupt() is evaluated as false because preempt_count() is always 0.

> -	if (unlikely(in_interrupt()))
> +	if (unlikely(preempt_count() || irqs_disabled() || rcu_preempt_depth()))

Oops, I got confused. preemptible() is always 0 for CONFIG_PREEMPT_COUNT=n case.
I think above condition is wrong. Updated patch is shown below.

>From 3dd03c34ee45fbdb3c8fd31b558a76db3a562b22 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 27 Mar 2017 10:53:08 +0900
Subject: [PATCH v2] mm: Allow calling vfree() from non-schedulable context.

Commit 5803ed292e63a1bf ("mm: mark all calls into the vmalloc subsystem
as potentially sleeping") added might_sleep() to remove_vm_area() from
vfree(), and is causing

[    2.616064] BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
[    2.616125] in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
[    2.616156] 2 locks held by plymouthd/341:
[    2.616158]  #0:  (drm_global_mutex){+.+.+.}, at: [<ffffffffc01c274b>] drm_release+0x3b/0x3b0 [drm]
[    2.616256]  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [<ffffffffc0173038>] ttm_object_file_release+0x28/0x90 [ttm]
[    2.616270] CPU: 2 PID: 341 Comm: plymouthd Not tainted 4.11.0-0.rc3.git0.1.kmallocwd.fc25.x86_64+debug #1
[    2.616271] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[    2.616273] Call Trace:
[    2.616281]  dump_stack+0x86/0xc3
[    2.616285]  ___might_sleep+0x17d/0x250
[    2.616289]  __might_sleep+0x4a/0x80
[    2.616293]  remove_vm_area+0x22/0x90
[    2.616296]  __vunmap+0x2e/0x110
[    2.616299]  vfree+0x42/0x90
[    2.616304]  kvfree+0x2c/0x40
[    2.616312]  drm_ht_remove+0x1a/0x30 [drm]
[    2.616317]  ttm_object_file_release+0x50/0x90 [ttm]
[    2.616324]  vmw_postclose+0x47/0x60 [vmwgfx]
[    2.616331]  drm_release+0x290/0x3b0 [drm]
[    2.616338]  __fput+0xf8/0x210
[    2.616342]  ____fput+0xe/0x10
[    2.616345]  task_work_run+0x85/0xc0
[    2.616351]  exit_to_usermode_loop+0xb4/0xc0
[    2.616355]  do_syscall_64+0x185/0x1f0
[    2.616359]  entry_SYSCALL64_slow_path+0x25/0x25

warning.

And commit 763b218ddfaf5676 ("mm: add preempt points into
__purge_vmap_area_lazy()") actually made vfree() potentially sleeping on
non-preemptible kernels. But we want to keep vfree() being callable from
non-schedulable context as with kfree() because vfree() is called via
kvfree().

This patch updates the condition to use __vfree_deferred() in order to
make sure that all vfree()/kvfree() users who did not notice that commit
will remain safe.

console_unlock() is a function which is prepared for being called from
non-schedulable context (e.g. spinlock held, inside RCU). It is using

  !oops_in_progress && preemptible() && !rcu_preempt_depth()

as a condition for whether it is safe to schedule. This patch uses that
condition with oops_in_progress check (which is not important for
__vunmap() case) removed.

Straightforward change will be

-	if (unlikely(in_interrupt()))
+	if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))

in vfree(). But we can remove in_interrupt() check due to reasons below.

If CONFIG_PREEMPT_COUNT=y, in_interrupt() and preemptible() are defined as

  #define in_interrupt() (irq_count())
  #define irq_count()    (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK))
  #define preemptible()  (preempt_count() == 0 && !irqs_disabled())

and therefore this condition can be rewritten as below.

-	if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))
+	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+		     !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))

-	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-		     !(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))
+	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+		     (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))

-	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-		     (preempt_count() != 0 || irqs_disabled()) || rcu_preempt_depth()))
+	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
+		     preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))

-	if (unlikely((preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK | NMI_MASK)) ||
-		     preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))
+	if (unlikely(preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))

-	if (unlikely(preempt_count() != 0 || irqs_disabled() || rcu_preempt_depth()))
+	if (unlikely(!(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))

-	if (unlikely(!(preempt_count() == 0 && !irqs_disabled()) || rcu_preempt_depth()))
+	if (unlikely(!preemptible() || rcu_preempt_depth()))

If CONFIG_PREEMPT_COUNT=n, preemptible() is defined as

  #define preemptible() 0

and therefore this condition can be rewritten as below.

-       if (unlikely(in_interrupt() || !(preemptible() && !rcu_preempt_depth())))
+       if (unlikely(in_interrupt() || !(0 && !rcu_preempt_depth())))

-       if (unlikely(in_interrupt() || !(0 && !rcu_preempt_depth())))
+       if (unlikely(in_interrupt() || !(0)))

-       if (unlikely(in_interrupt() || !(0)))
+       if (unlikely(in_interrupt() || 1))

-       if (unlikely(in_interrupt() || 1))
+       if (unlikely(1))

Also drop unlikely() part because caller holding spinlock or inside RCU is not
such uncommon cases.

Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Jisheng Zhang <jszhang@xxxxxxxxxxx>
Cc: Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx>
Cc: Joel Fernandes <joelaf@xxxxxxxxxx>
Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: John Dias <joaodias@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> # v4.10
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0b05762..36334ff 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1589,7 +1589,7 @@ void vfree(const void *addr)
 
 	if (!addr)
 		return;
-	if (unlikely(in_interrupt()))
+	if (!preemptible() || rcu_preempt_depth())
 		__vfree_deferred(addr);
 	else
 		__vunmap(addr, 1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux