On Mon, Apr 15, 2019 at 7:58 AM Jitendra Sharma <shajit@xxxxxxxxxxxxxx> wrote: > > Hi Kees Cook/Luis, > > We are observing one kernel crash in next_tgid function through > getdents64 path. Call stack is as shown below: > > -000|has_group_leader_pid(inline) > -000|next_tgid( > | [X20] ns = 0xFFFFFF87CABB1AC0, > | [locdesc] iter = ( > | [locdesc] tgid = 424, > | [locdesc] task = ?)) > | [X21] p = 0xFFFFFFD0FFFFF948 > | [X21] task = 0xFFFFFFD0FFFFF948 > -001|proc_pid_readdir( > | [X20] file = 0xFFFFFFD1AC60FC40, > | [X19] ctx = 0xFFFFFF8027363E40) > | [X21] ns = 0xFFFFFF87CABB1AC0 > -002|proc_root_readdir( > | [X20] file = 0xFFFFFFD1AC60FC40, > | [X19] ctx = 0xFFFFFF8027363E40) > -003|iterate_dir( > | [X19] file = 0xFFFFFFD1AC60FC40, > | [X22] ctx = 0xFFFFFF8027363E40) > | [X23] inode = 0xFFFFFFD1F20246D0 > -004|SYSC_getdents64(inline) > -004|sys_getdents64( > | ?, > | ?, > | [X19] count = 4200) > | [X19] count = 4200 > | [X20] f = ([X20] file = 0xAC60FC43AC60FC40, [X20] flags = 1207898624) > | [X0] error = -1720 > -005|el0_svc_naked(asm) > -->|exception > -006|NUX:0x78C5AD7D38(asm) > ---|end of frame > > > From this call stack,task: 0xFFFFFFD0FFFFF948, seems to be invalid. > As(from ramdumps) it doesn't have any valid fields. And while trying to > access the fields of this task struct in has_group_leader_pid, abort is > happening. > > From the dumps, its not clear why the task struct is coming to be some > invalid (Possibly task has already exited). This issue is observed > during normal monkey testing for long hours. > > Could you please provide some pointers which could help in debugging > this issue further. Do you have any hints on how to reproduce this? I assume something is missing proper locking or RCU handling, but I don't see anything obvious in the surrounding code yet... -- Kees Cook