On Tue, Jun 19, 2018 at 6:08 AM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > On Tue, Jun 19, 2018 at 5:59 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: >> Hi Jason, yes please do send me the test suite with the kernel config. > > $ git clone https://git.zx2c4.com/WireGuard > $ cd WireGuard/src > $ [[ $(gcc -v 2>&1) =~ gcc\ version\ 8\.1\.0 ]] || echo crash needs 8.1 > $ export DEBUG_KERNEL=yes > $ export KERNEL_VERSION=4.18-rc1 > $ make test-qemu -j$(nproc) > > This will build a kernel and a minimal userland and load it in qemu, > which must be installed. > > This code is what causes the crash: > The self test that's executed: > https://git.zx2c4.com/WireGuard/tree/src/selftest/ratelimiter.h > Which exercises this code: > https://git.zx2c4.com/WireGuard/tree/src/ratelimiter.c > > The problem occurs after gc_entries(NULL) frees things (line 124 in > ratelimiter.h above), and then line 133 reallocates those objects. > Sometime after that happens, elsewhere in the kernel invokes this > kasan issue in the kasan cache cleanup. > > I realize it's disappointing that the test case here is in WireGuard, > which isn't (yet!) upstream. That's why in my original message I > wrote: > >> Rather, it looks like this >> commit introduces a performance optimization, rather than a >> correctness fix, so it seems that whatever test case is failing is >> likely an incorrect failure. Does that seem like an accurate >> possibility to you? > > I was hoping to only point you toward my own code after establishing > the possibility that the bug is not my own. If you still think there's > a chance this is due to my own correctness issue, and your commit has > simply unearthed it, let me know and I'll happily keep debugging on my > own before pinging you further. Hi Jason, Your code frees all entries before freeing the cache, right? If you add total_entries check before freeing the cache, it does not fire, right? Are you using SLAB or SLUB? We stress kernel pretty heavily, but with SLAB, and I suspect Shakeel may also be using SLAB. So if you are using SLUB, there is significant chance that it's a bug in the SLUB part of the change.