On 06/28/2017 01:16 PM, Dmitry Vyukov wrote: > On Wed, Jun 28, 2017 at 12:02 PM, Sebastian Andrzej Siewior > <bigeasy@xxxxxxxxxxxxx> wrote: >> Trying to boot tip/master resulted in: >> |DMAR: dmar0: Using Queued invalidation >> |DMAR: dmar1: Using Queued invalidation >> |DMAR: Setting RMRR: >> |DMAR: Setting identity map for device 0000:00:1a.0 [0xbdcf9000 - 0xbdd1dfff] >> |BUG: unable to handle kernel NULL pointer dereference at (null) >> |IP: __domain_mapping+0x10f/0x3d0 >> |PGD 0 >> |P4D 0 >> | >> |Oops: 0002 [#1] PREEMPT SMP >> |Modules linked in: >> |CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc6-00117-g235a93822a21 #113 >> |task: ffff8805271c2c80 task.stack: ffffc90000058000 >> |RIP: 0010:__domain_mapping+0x10f/0x3d0 >> |RSP: 0000:ffffc9000005bca0 EFLAGS: 00010246 >> |RAX: 0000000000000000 RBX: 00000000bdcf9003 RCX: 0000000000000000 >> |RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 >> |RBP: ffffc9000005bd00 R08: ffff880a243e9780 R09: ffff8805259e67c8 >> |R10: 00000000000bdcf9 R11: 0000000000000000 R12: 0000000000000025 >> |R13: 0000000000000025 R14: 0000000000000000 R15: 00000000000bdcf9 >> |FS: 0000000000000000(0000) GS:ffff88052acc0000(0000) knlGS:0000000000000000 >> |CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> |CR2: 0000000000000000 CR3: 0000000001c0f000 CR4: 00000000000406e0 >> |Call Trace: >> | iommu_domain_identity_map+0x5a/0x80 >> | domain_prepare_identity_map+0x9f/0x160 >> | iommu_prepare_identity_map+0x7e/0x9b >> >> bisect points to commit 235a93822a21 ("locking/atomics, asm-generic: Add KASAN >> instrumentation to atomic operations"), RIP is at >> tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); >> in drivers/iommu/intel-iommu.c. The assembly for this inline assembly >> is: >> xor %edx,%edx >> xor %eax,%eax >> cmpxchg %rbx,(%rdx) >> >> and as you see edx is set to zero and used later as a pointer via the >> full register. This happens with gcc-6, 5 and 8 (snapshot from last >> week). >> After a longer while of searching and swearing I figured out that this >> bug occures once cmpxchg64_local() and cmpxchg_local() uses the same >> ____ptr macro and they are shadow somehow. What I don't know why edx is >> set to zero. >> >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> >> --- >> include/asm-generic/atomic-instrumented.h | 12 ++++++------ >> 1 file changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h >> index a0f5b7525bb2..ac6155362b39 100644 >> --- a/include/asm-generic/atomic-instrumented.h >> +++ b/include/asm-generic/atomic-instrumented.h >> @@ -359,16 +359,16 @@ static __always_inline bool atomic64_add_negative(s64 i, atomic64_t *v) >> >> #define cmpxchg64(ptr, old, new) \ >> ({ \ >> - __typeof__(ptr) ____ptr = (ptr); \ >> - kasan_check_write(____ptr, sizeof(*____ptr)); \ >> - arch_cmpxchg64(____ptr, (old), (new)); \ >> + __typeof__(ptr) ____ptr64 = (ptr); \ >> + kasan_check_write(____ptr64, sizeof(*____ptr64));\ >> + arch_cmpxchg64(____ptr64, (old), (new)); \ >> }) >> >> #define cmpxchg64_local(ptr, old, new) \ >> ({ \ >> - __typeof__(ptr) ____ptr = (ptr); \ >> - kasan_check_write(____ptr, sizeof(*____ptr)); \ >> - arch_cmpxchg64_local(____ptr, (old), (new)); \ >> + __typeof__(ptr) ____ptr64 = (ptr); \ >> + kasan_check_write(____ptr64, sizeof(*____ptr64));\ >> + arch_cmpxchg64_local(____ptr64, (old), (new)); \ >> }) >> >> #define cmpxchg_double(p1, p2, o1, o2, n1, n2) \ > > > Doh! Thanks for fixing this. I think I've a similar crash in a > different place when I developed the patch. > The problem is that when we do: > > __typeof__(ptr) ____ptr = (ptr); \ > arch_cmpxchg64_local(____ptr, (old), (new)); \ > > We don't necessary pass value of our just declared ____ptr to > arch_cmpxchg64_local(). We just pass a symbolic identifier. So if > arch_cmpxchg64_local() declares own ____ptr and then tries to use what > we passed ("____ptr") it will actually refer to own variable declared > rather than to what we wanted to pass in. > > In my case I ended up with something like: > > __typeof__(foo) __ptr = __ptr; > > which compiler decided to turn into 0. > > Thank you, macros. > > We can add more underscores, but the problem can happen again. Should > we prefix current function/macro name to all local vars?.. > The main problem here is that arch_cmpxchg64_local() calls cmpxhg_local() instead of using arch_cmpxchg_local(). So, the patch bellow should fix the problem, also this will fix double instrumentation of cmpcxchg64[_local](). But I haven't tested this patch yet. --- arch/x86/include/asm/cmpxchg_64.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h index fafaebacca2d..7046a3cc2493 100644 --- a/arch/x86/include/asm/cmpxchg_64.h +++ b/arch/x86/include/asm/cmpxchg_64.h @@ -9,13 +9,13 @@ static inline void set_64bit(volatile u64 *ptr, u64 val) #define arch_cmpxchg64(ptr, o, n) \ ({ \ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ - cmpxchg((ptr), (o), (n)); \ + arch_cmpxchg((ptr), (o), (n)); \ }) #define arch_cmpxchg64_local(ptr, o, n) \ ({ \ BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ - cmpxchg_local((ptr), (o), (n)); \ + arch_cmpxchg_local((ptr), (o), (n)); \ }) #define system_has_cmpxchg_double() boot_cpu_has(X86_FEATURE_CX16) -- 2.13.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>