Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Mon, 29 Apr 2024 10:00:30 +0900

On 2024/04/29 9:50, Linus Torvalds wrote:
> On Sun, 28 Apr 2024 at 16:23, Hillf Danton <hdanton@xxxxxxxx> wrote:
>>
>> So is game like copying from/putting to user with runqueue locked
>> at the first place.
> 
> No, that should be perfectly fine. In fact, it's even normal. It would
> happen any time you have any kind of tracing thing, where looking up
> the user mode frame involves doing user accesses with page faults
> disabled.
> 
> The runqueue lock is irrelevant. As mentioned, it's only a symptom of
> something else going wrong.
> 
> Now, judging by the syz reproducer, the trigger for this all is almost
> certainly that
> 
>    bpf$BPF_RAW_TRACEPOINT_OPEN(0x11,
> &(0x7f00000000c0)={&(0x7f0000000080)='sched_switch\x00', r0}, 0x10)
> 
> and that probably causes the instability. But the immediate problem is
> not the user space access, it's that something goes horribly wrong
> *around* it.

I can't recall title of the commit, but I feel that things went very wrong
after a commit that allows running tracing function upon lock contention
(run code when e.g. a spinlock could not be taken) was introduced.
That commit is forming random locking dependency, resulting in flood of
lockdep warnings.

> 
>> Plus as per another syzbot report [1], bpf could make trouble with
>> workqueue pool locked.
> 
> That seems to be entirely different. There's no unexplained page fault
> in that case, that seems to be purely a "take lock in the wrong order"
> 
>                 Linus