On 1/13/19 9:20 PM, David Lechner wrote: > On 1/11/19 8:58 PM, Michel Lespinasse wrote: >> On Fri, Jan 11, 2019 at 3:47 PM David Lechner <david@xxxxxxxxxxxxxx> wrote: >>> >>> On 1/11/19 2:58 PM, Qian Cai wrote: >>>> A GPF was reported, >>>> >>>> kasan: CONFIG_KASAN_INLINE enabled >>>> kasan: GPF could be caused by NULL-ptr deref or user memory access >>>> general protection fault: 0000 [#1] SMP KASAN >>>> kasan_die_handler.cold.22+0x11/0x31 >>>> notifier_call_chain+0x17b/0x390 >>>> atomic_notifier_call_chain+0xa7/0x1b0 >>>> notify_die+0x1be/0x2e0 >>>> do_general_protection+0x13e/0x330 >>>> general_protection+0x1e/0x30 >>>> rb_insert_color+0x189/0x1480 >>>> create_object+0x785/0xca0 >>>> kmemleak_alloc+0x2f/0x50 >>>> kmem_cache_alloc+0x1b9/0x3c0 >>>> getname_flags+0xdb/0x5d0 >>>> getname+0x1e/0x20 >>>> do_sys_open+0x3a1/0x7d0 >>>> __x64_sys_open+0x7e/0xc0 >>>> do_syscall_64+0x1b3/0x820 >>>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> >>>> It turned out, >>>> >>>> gparent = rb_red_parent(parent); >>>> tmp = gparent->rb_right; <-- GPF was triggered here. >>>> >>>> Apparently, "gparent" is NULL which indicates "parent" is rbtree's root >>>> which is red. Otherwise, it will be treated properly a few lines above. >>>> >>>> /* >>>> * If there is a black parent, we are done. >>>> * Otherwise, take some corrective action as, >>>> * per 4), we don't want a red root or two >>>> * consecutive red nodes. >>>> */ >>>> if(rb_is_black(parent)) >>>> break; >>>> >>>> Hence, it violates the rule #1 (the root can't be red) and need a fix >>>> up, and also add a regression test for it. This looks like was >>>> introduced by 6d58452dc06 where it no longer always paint the root as >>>> black. >>>> >>>> Fixes: 6d58452dc06 (rbtree: adjust root color in rb_insert_color() only >>>> when necessary) >>>> Reported-by: Esme <esploit@xxxxxxxxxxxxx> >>>> Tested-by: Joey Pabalinas <joeypabalinas@xxxxxxxxx> >>>> Signed-off-by: Qian Cai <cai@xxxxxx> >>>> --- >>> >>> Tested-by: David Lechner <david@xxxxxxxxxxxxxx> >>> FWIW, this fixed the following crash for me: >>> >>> Unable to handle kernel NULL pointer dereference at virtual address 00000004 >> >> Just to clarify, do you have a way to reproduce this crash without the fix ? > > I am starting to suspect that my crash was caused by some new code > in the drm-misc-next tree that might be causing a memory corruption. > It threw me off that the stack trace didn't contain anything related > to drm. > > See: https://patchwork.freedesktop.org/patch/276719/ > It may be useful for those who could reproduce this issue to turn on those memory corruption debug options to narrow down a bit. CONFIG_DEBUG_PAGEALLOC=y CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y CONFIG_SLUB_DEBUG_ON=y