On 4/24/24 2:28 PM, Christian König wrote:
I don't fully understand how that happens either, it could be that there is some bug in the EPOLL_FD code. Maybe it's a race when the EPOLL file descriptor is closed or something like that.
IIUC the race condition looks like the following: Thread 0 Thread 1 -> do_epoll_ctl() f_count++, now 2 ... ... -> vfs_poll(), f_count == 2 ... ... <- do_epoll_ctl() ... f_count--, now 1 ... -> filp_close(), f_count == 1 ... ... -> dma_buf_poll(), f_count == 1 -> fput() ... [*** race window ***] f_count--, now 0 -> maybe get_file(), now ??? -> __fput() (delayed) E.g. dma_buf_poll() may be entered in thread 1 with f->count == 1 and call to get_file() shortly later (and may even skip this if there is nothing to EPOLLIN or EPOLLOUT). During this time window, thread 0 may call fput() (on behalf of close() in this example) and (since it sees f->count == 1) file is scheduled to delayed_fput(). Dmitry