kasan: bug in quarantine_remove_cache

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Mon, 6 Mar 2017 17:03:38 +0100

Hi,

I think I see a nasty race condition in quarantine_remove_cache. It
seems to manifest as the following crash, I've seen several of them
but did not have any explanation.


BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
PGD 6aeea067
PUD 60ed7067
PMD 0
Oops: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88005f948040 task.stack: ffff880069818000
RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155
RSP: 0018:ffff88006981f298 EFLAGS: 00010246
RAX: ffffea0000ffff00 RBX: 0000000000000000 RCX: ffffea0000ffff1f
RDX: 0000000000000000 RSI: ffff88003fffc3e0 RDI: 0000000000000000
RBP: ffff88006981f2c0 R08: ffff88002fed7bd8 R09: 00000001001f000d
R10: 00000000001f000d R11: ffff88006981f000 R12: ffff88003fffc3e0
R13: ffff88006981f2d0 R14: ffffffff81877fae R15: 0000000080000000
FS:  00007fb911a2d700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000c8 CR3: 0000000060ed6000 CR4: 00000000000006f0
Call Trace:
 quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239
 kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
 slab_post_alloc_hook mm/slab.h:456 [inline]
 slab_alloc_node mm/slub.c:2718 [inline]
 kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754
 __alloc_skb+0x10f/0x770 net/core/skbuff.c:219
 alloc_skb include/linux/skbuff.h:932 [inline]
 _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388
 sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline]
 sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746
 sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266
 sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1685
 SyS_sendto+0x40/0x50 net/socket.c:1653
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:00007fb911a2cb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004458b9
RDX: 00000000000000ae RSI: 0000000020477000 RDI: 0000000000000005
RBP: 00000000006e1ba0 R08: 0000000020477000 R09: 0000000000000010
R10: 0000000000000040 R11: 0000000000000282 R12: 0000000000708000
R13: 0000000000000000 R14: 00007fb911a2d9c0 R15: 00007fb911a2d700
Code: e5 41 57 41 56 41 55 41 54 53 48 89 f3 48 8b 37 48 85 f6 0f 84
8e 00 00 00 49 89 fd 49 c7 c6 ae 7f 87 81 41 bf 00 00 00 80 eb 1d <48>
63 87 c8 00 00 00 4c 8b 26 4c 89 f2 48 29 c6 e8 4d ce ff ff
RIP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155 RSP: ffff88006981f298
CR2: 00000000000000c8
---[ end trace a5485c8c9b67efdd ]---

quarantine_remove_cache frees all pending objects that belong to the
cache, before we destroy the cache itself. However there are 2
possibilities how it can fail to do so.

First, another thread can hold some of the objects from the cache in
temp list in quarantine_put. quarantine_put has a windows of enabled
interrupts, and on_each_cpu in quarantine_remove_cache can finish
right in that window. These objects will be later freed into the
destroyed cache.

Then, quarantine_reduce has the same problem. It grabs a batch of
objects from the global quarantine, then unlocks quarantine_lock and
then frees the batch. quarantine_remove_cache can finish while some
objects from the cache are still in the local to_free list in
quarantine_reduce.

I am trying to find a reasonably simple, elegant and performant
solution for this.
What I have now is this:

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 6f1ed1630873..58c280bd5a4d 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -180,10 +180,7 @@ void quarantine_put(struct kasan_free_meta *info,
struct kmem_cache *cache)
         if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE))
                 qlist_move_all(q, &temp);

-        local_irq_restore(flags);
-
-        if (unlikely(!qlist_empty(&temp))) {
-                spin_lock_irqsave(&quarantine_lock, flags);
+                spin_lock(&quarantine_lock);
                 WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
                 qlist_move_all(&temp, &global_quarantine[quarantine_tail]);
                 if (global_quarantine[quarantine_tail].bytes >=
@@ -196,8 +193,10 @@ void quarantine_put(struct kasan_free_meta *info,
struct kmem_cache *cache)
                         if (new_tail != quarantine_head)
                                 quarantine_tail = new_tail;
                 }
-                spin_unlock_irqrestore(&quarantine_lock, flags);
+                spin_unlock(&quarantine_lock);
         }
+
+        local_irq_restore(flags);
 }

 void quarantine_reduce(void)
@@ -210,7 +209,8 @@ void quarantine_reduce(void)
                    READ_ONCE(quarantine_max_size)))
                 return;

-        spin_lock_irqsave(&quarantine_lock, flags);
+        local_irq_save(flags);
+        spin_lock(&quarantine_lock);

         /*
          * Update quarantine size in case of hotplug. Allocate a fraction of
@@ -234,9 +234,11 @@ void quarantine_reduce(void)
                         quarantine_head = 0;
         }

-        spin_unlock_irqrestore(&quarantine_lock, flags);
+        spin_unlock(&quarantine_lock);

         qlist_free_all(&to_free, NULL);
+
+        local_irq_restore(flags);
 }

 static void qlist_move_cache(struct qlist_head *from,
@@ -288,4 +290,6 @@ void quarantine_remove_cache(struct kmem_cache *cache)
         spin_unlock_irqrestore(&quarantine_lock, flags);

         qlist_free_all(&to_free, cache);
+
+        synchronize_sched();
 }


If we disable interrupts for the most part of quarantine_put and
quarantine_reduce, and do 2 synchronize_sched's in
quarantine_remove_cache (one being on_each_cpu), then it should
resolve the race.
However, I am not sure if it is OK to disable interrupts when we are
freeing a large batch in quarantine_reduce. And probably
synchronize_sched in quarantine_remove_cache will have negative
performance impact for container workloads. What do you think?
Is using rcu_read_lock/unlock and synchronize_rcu better?
Or maybe we could do own simplified version of rcu with 2 epochs
(effectively counters of pending quarantine_put/reduce). But I really
don't like going that way complexity-wise.

Any other suggestions?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>