On Fri, May 03, 2024 at 06:54:22PM +0700, Bui Quang Minh wrote: > [...] > Root cause: > AFAIK, eventpoll (epoll) does not increase the registered file's reference. > To ensure the safety, when the registered file is deallocated in __fput, > eventpoll_release is called to unregister the file from epoll. When calling > poll on epoll, epoll will loop through registered files and call vfs_poll on > these files. In the file's poll, file is guaranteed to be alive, however, as > epoll does not increase the registered file's reference, the file may be > dying > and it's not safe the get the file for later use. And dma_buf_poll violates > this. In the dma_buf_poll, it tries to get_file to use in the callback. This > leads to a race where the dmabuf->file can be fput twice. > > Here is the race occurs in the above proof-of-concept > > close(dmabuf->file) > __fput_sync (f_count == 1, last ref) > f_count-- (f_count == 0 now) > __fput > epoll_wait > vfs_poll(dmabuf->file) > get_file(dmabuf->file)(f_count == 1) > eventpoll_release > dmabuf->file deallocation > fput(dmabuf->file) (f_count == 1) > f_count-- > dmabuf->file deallocation > > I am not familiar with the dma_buf so I don't know the proper fix for the > issue. About the rule that don't get the file for later use in poll callback > of > file, I wonder if it is there when only select/poll exist or just after > epoll > appears. > > I hope the analysis helps us to fix the issue. Thanks for doing this analysis! I suspect at least a start of a fix would be this: diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 8fe5aa67b167..15e8f74ee0f2 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) if (events & EPOLLOUT) { /* Paired with fput in dma_buf_poll_cb */ - get_file(dmabuf->file); - - if (!dma_buf_poll_add_cb(resv, true, dcb)) + if (!atomic_long_inc_not_zero(&dmabuf->file) && + !dma_buf_poll_add_cb(resv, true, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); else @@ -290,9 +289,8 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) if (events & EPOLLIN) { /* Paired with fput in dma_buf_poll_cb */ - get_file(dmabuf->file); - - if (!dma_buf_poll_add_cb(resv, false, dcb)) + if (!atomic_long_inc_not_zero(&dmabuf->file) && + !dma_buf_poll_add_cb(resv, false, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); else But this ends up leaving "active" non-zero, and at close time it runs into: BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active); But the bottom line is that get_file() is unsafe to use in some places, one of which appears to be in the poll handler. There are maybe some other fragile places too, like in drivers/gpu/drm/vmwgfx/ttm_object.c: static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf) { return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L; } Which I also note involves a dmabuf... Due to this issue I've proposed fixing get_file() to detect pathological states: https://lore.kernel.org/lkml/20240502222252.work.690-kees@xxxxxxxxxx/ But that has run into some push-back. I'm hoping that seeing this epoll example will help illustrate what needs fixing a little better. I think the best current proposal is to just WARN sooner instead of a full refcount_t implementation: diff --git a/include/linux/fs.h b/include/linux/fs.h index 8dfd53b52744..e09107d0a3d6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1040,7 +1040,8 @@ struct file_handle { static inline struct file *get_file(struct file *f) { - atomic_long_inc(&f->f_count); + long prior = atomic_long_fetch_inc_relaxed(&f->f_count); + WARN_ONCE(!prior, "struct file::f_count incremented from zero; use-after-free condition present!\n"); return f; } What's the right way to deal with the dmabuf situation? (And I suspect it applies to get_dma_buf_unless_doomed() as well...) -Kees -- Kees Cook