On 2024/04/29 9:50, Linus Torvalds wrote: > On Sun, 28 Apr 2024 at 16:23, Hillf Danton <hdanton@xxxxxxxx> wrote: >> >> So is game like copying from/putting to user with runqueue locked >> at the first place. > > No, that should be perfectly fine. In fact, it's even normal. It would > happen any time you have any kind of tracing thing, where looking up > the user mode frame involves doing user accesses with page faults > disabled. > > The runqueue lock is irrelevant. As mentioned, it's only a symptom of > something else going wrong. > > Now, judging by the syz reproducer, the trigger for this all is almost > certainly that > > bpf$BPF_RAW_TRACEPOINT_OPEN(0x11, > &(0x7f00000000c0)={&(0x7f0000000080)='sched_switch\x00', r0}, 0x10) > > and that probably causes the instability. But the immediate problem is > not the user space access, it's that something goes horribly wrong > *around* it. I can't recall title of the commit, but I feel that things went very wrong after a commit that allows running tracing function upon lock contention (run code when e.g. a spinlock could not be taken) was introduced. That commit is forming random locking dependency, resulting in flood of lockdep warnings. > >> Plus as per another syzbot report [1], bpf could make trouble with >> workqueue pool locked. > > That seems to be entirely different. There's no unexplained page fault > in that case, that seems to be purely a "take lock in the wrong order" > > Linus