On 2024/07/02 0:10, Andrey Konovalov wrote: > This is weird, because if the metadata is 00, then the memory should > be accessible and there should be no KASAN report. > > Which makes me believe you have some kind of a race in your patch (or > there's a race in the kernel that your patch somehow exposes). Yes, I consider that my patch is exposing an existing race, for I can't find a race in my patch. (Since https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b96342141183ffa62bfed5998f9b808c84042322 calls get_task_struct() when recording in-use state, report_rtnl_holders() can't trigger use-after-free even if the thread died. Also, since previous version https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=5210cbe9a47fc5c1f43ba16d481e6335f3e2f345 synchronously calls synchronize_rcu() when clearing in-use state, report_rtnl_holders() can't trigger use-after-free because the thread can't die before calling put_rtnl_holder(). The variable "now" cannot be 0, and !cmpxchg(&rtnl_started[idx], 0, now) must serve as a serialization lock when recording in-use state.) > At > least between the moment KASAN detected the issue and the moment the > reporting procedure got to printing the memory state, the memory state > changed. Indeed, the exact line KASAN complained at varies suggests that the memory state is modified by somebody else. > As this is stack memory that comes from a vmalloc allocation, > I suspect the task whose stack had been at that location died, and > something else got mapped there. I consider that the task can't die while calling __show_regs() from report_rtnl_holders(). > > This is my best guess, I hope it's helpful. Well, KASAN says "out-of-bounds". But the reported address BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610 Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234 is within the kernel stack memory mapping The buggy address belongs to the virtual mapping at [ffffc90003c48000, ffffc90003c51000) created by: copy_process+0x5d1/0x3d7 . Why is this "out-of-bounds" ? What boundary did KASAN compare with? Is this just a race of KASAN detecting a problem and KASAN reporting that problem? (But as I explained above, it is unlikely that the thread to be reported can die while processing report_rtnl_holders()...)