Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > Hi David, any ideas about this one? Looks like it triggers on fairly > recent upstream? I've managed to reproduce it finally. Instrumenting the pipe_lock/unlock functions, splice_to_socket() and pipe_release() seems to show that pipe_release() is being called whilst splice_to_socket() is still running. I *think* syzbot is arranging things such that splice_to_socket() takes a significant amount of time so that another thread can close the socket as it exits. In this sample logging, the pipe is created by pid 7101: [ 66.205719] --pipe 7101 [ 66.209942] lock [ 66.212526] locked [ 66.215344] unlock [ 66.218103] unlocked splice begins in 7101 also and locks the pipe: [ 66.221057] ==>splice_to_socket() 7101 [ 66.225596] lock [ 66.228177] locked but for some reason, pid 7100 then tries to release it: [ 66.377781] release 7100 and hangs on the __pipe_lock() call in pipe_release(): [ 66.381059] lock The syz reproducer does weird things with threading - and I'm wondering if there's a file struct refcount bug here. Note that splice_to_socket() can't access the pipe file structs to alter the refcount, and the involved pipe isn't communicated to udp_sendmsg() in any way - so if there is a refcount bug, it must be somewhere in the VFS, the pipe driver or the splice infrastructure:-/. I'm also not sure what's going on inside udp_sendmsg() as yet. It doesn't show a stack in /proc/7101/stacks, which means it doesn't hit a schedule(). David