On Fri, May 3, 2024 at 6:40 AM Charan Teja Kalla <quic_charante@xxxxxxxxxxx> wrote: > > Thanks Christian/TJ for the inputs!! > > On 4/18/2024 12:16 PM, Christian König wrote: > > As far as I can see the EPOLL holds a reference to the files it > > contains. So it is perfectly valid to add the file descriptor to EPOLL > > and then close it. > > > > In this case the file won't be closed, but be kept alive by it's > > reference in the EPOLL file descriptor. > > I am not seeing that adding a 'fd' into the epoll monitoring list will > increase its refcount. Confirmed by writing a testcase that just do > open + EPOLL_CTL_ADD and see the /proc/../fdinfo of epoll fd(Added > file_count() info to the output) > # cat /proc/136/fdinfo/3 > pos: 0 > flags: 02 > mnt_id: 14 > ino: 1041 > tfd: 4 events: 19 data: 4 pos:0 ino:81 sdev:5 > refcount: 1<-- The fd added to the epoll monitoring is still 1(same as > open file refcount) > > From the code too, I don't see a file added in the epoll monitoring list > will have an extra refcount but momentarily (where it increases the > refcount of target file, add it to the rbtree of the epoll context and > then decreasing the file refcount): > do_epoll_ctl(): > f = fdget(epfd); > tf = fdget(fd); > epoll_mutex_lock(&ep->mtx) > EPOLL_CTL_ADD: > ep_insert(ep, epds, tf.file, fd, full_check); // Added to the epoll > monitoring rb tree list as epitem. > mutex_unlock(&ep->mtx); > fdput(tf); > fdput(f); > > > Not sure If i am completely mistaken what you're saying here. > > > The fs layer which calls dma_buf_poll() should make sure that the file > > can't go away until the function returns. > > > > Then inside dma_buf_poll() we add another reference to the file while > > installing the callback: > > > > /* Paired with fput in dma_buf_poll_cb */ > > get_file(dmabuf->file); No, exactly that can't > > happen either. > > > > I am not quite comfortable with epoll internals but I think below race > is possible where "The 'file' passed to dma_buf_poll() is proper but > ->f_count == 0, which is indicating that a parallel freeing is > happening". So, we should check the file->f_count value before taking > the refcount. > > (A 'fd' registered for the epoll monitoring list is maintained as > 'epitem(epi)' in the rbtree of 'eventpoll(ep, called as epoll context). > > epoll_wait() __fput()(as file->f_count=0) > ------------------------------------------------------------------------ > a) ep_poll_callback(): > Is the waitqueue function > called when signaled on the > wait_queue_head_t of the registered > poll() function. > > 1) It links the 'epi' to the ready list > of 'ep': > if (!ep_is_linked(epi)) > list_add_tail_lockless(&epi->rdllink, > &ep->rdllist) > > 2) wake_up(&ep->wq); > Wake up the process waiting > on epoll_wait() that endup > in ep_poll. > > > b) ep_poll(): > 1) while (1) { > eavail = ep_events_available(ep); > (checks ep->rdlist) > ep_send_events(ep); > (notify the events to user) > } > (epoll_wait() calling process gets > woken up from a.2 and process the > events raised added to rdlist in a.1) > > 2) ep_send_events(): > mutex_lock(&ep->mtx); > (** The sync lock is taken **) > list_for_each_entry_safe(epi, tmp, > &txlist, rdllink) { > list_del_init(&epi->rdllink); > revents = ep_item_poll(epi, &pt, 1) > (vfs_poll()-->...f_op->poll(=dma_buf_poll) > } > mutex_unlock(&ep->mtx) > (**release the lock.**) > > (As part of procession of events, > each epitem is removed from rdlist > and call the f_op->poll() of a file > which will endup in dma_buf_poll()) > > 3) dma_buf_poll(): > get_file(dmabuf->file); > (** f_count changed from 0->1 **) > dma_buf_poll_add_cb(resv, true, dcb): > (registers dma_buf_poll_cb() against fence) > > > c) eventpoll_release_file(): > mutex_lock(&ep->mtx); > __ep_remove(ep, epi, true): > mutex_unlock(&ep->mtx); > (__ep_remove() will remove the > 'epi' from rbtree and if present > from rdlist as well) > > d) file_free(file), free the 'file'. > > e) dma_buf_poll_cb: > /* Paired with get_file in dma_buf_poll */ > fput(dmabuf->file); > (f_count changed from 1->0, where > we try to free the 'file' again > which is UAF/double free). > > > > In the above race, If c) gets called first, then the 'epi' is removed > from both rbtree and 'rdlink' under ep->mtx lock thus b.2 don't end up > in calling the ->poll() as it don't see this event in the rdlist. > > Race only exist If b.2 executes first, where it will call dma_buf_poll > with __valid 'struct file' under ep->mtx but its refcount is already > could have been zero__. Later When e) is executed, it turns into double > free of the 'file' structure. > > If you're convinced with the above race, should the fix here will be > this simple check: > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 8fe5aa67b167..e469dd9288cc > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -240,6 +240,10 @@ static __poll_t dma_buf_poll(struct file *file, > poll_table *poll) > struct dma_resv *resv; > __poll_t events; > > + /* Check parallel freeing of file */ > + if (!file_count(file)) > + return 0; > + > > Thanks, > Charan Hi Charan, It looks like a similar conclusion about epoll was reached at: https://lore.kernel.org/all/a87d7ef8-2c59-4dc5-ba0a-b821d1effc72@xxxxxxx/ I agree with Christian that it should not be possible for the file to be freed while inside dma_buf_poll. Aside from causing problems in dma_buf_poll, ep_item_poll itself would have issues dereferencing the freed file pointer. Christian's patch there makes me wonder about what the epoll man page says about closing files. "Will closing a file descriptor cause it to be removed from all epoll interest lists?" Answer: Yes https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man7/epoll.7#n442 It looks like eventpoll_release_file is responsible for that, but if the epitem is changed to hold a reference then that can't be true anymore because __fput will never call eventpoll_release_file (until EPOLL_CTL_DEL). But how do you call EPOLL_CTL_DEL if you've closed all your references to the file in userspace? So I think epoll needs a slightly more complicated fix.