On Mon, Jul 1, 2024 at 2:43 PM Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > Hello, KASAN people. > > I suspect that KASAN's metadata for kernel stack memory got out of sync for > unknown reason, for the stack trace of PID=7558 was successfully printed for > two times before KASAN complains upon trying to print for the the third time. > Would you decode what is this KASAN message saying? > > Quoting from https://syzkaller.appspot.com/text?tag=CrashLog&x=119fd081980000 : [...] > [ 229.319713][ C0] ================================================================== > [ 229.327779][ C0] BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610 > [ 229.335174][ C0] Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234 [...] > [ 230.044183][ C0] Memory state around the buggy address: > [ 230.049816][ C0] ffffc90003c4f680: f2 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 > [ 230.057889][ C0] ffffc90003c4f700: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 > [ 230.065961][ C0] >ffffc90003c4f780: 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00 00 00 00 00 > [ 230.074059][ C0] ^ > [ 230.078915][ C0] ffffc90003c4f800: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 > [ 230.086983][ C0] ffffc90003c4f880: 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > [ 230.095056][ C0] ================================================================== I checked some of the other syzbot reports for this bug, and this memory state part in some of them looks different. Specifically, for https://syzkaller.appspot.com/text?tag=CrashLog&x=14293f0e980000: [ 1558.929174][ C1] Memory state around the buggy address: [ 1558.934796][ C1] ffffc9000b8bf400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1558.942852][ C1] ffffc9000b8bf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1558.950897][ C1] >ffffc9000b8bf500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1558.958943][ C1] ^ [ 1558.964569][ C1] ffffc9000b8bf580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1558.972613][ C1] ffffc9000b8bf600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This is weird, because if the metadata is 00, then the memory should be accessible and there should be no KASAN report. Which makes me believe you have some kind of a race in your patch (or there's a race in the kernel that your patch somehow exposes). At least between the moment KASAN detected the issue and the moment the reporting procedure got to printing the memory state, the memory state changed. As this is stack memory that comes from a vmalloc allocation, I suspect the task whose stack had been at that location died, and something else got mapped there. This is my best guess, I hope it's helpful.