Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 04, 2024 at 08:40:25AM -0700, Linus Torvalds wrote:
> On Sat, 4 May 2024 at 08:32, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the
> > file closing completes, eventpoll_release() finishes [..]
> 
> Actually, Al is right that ep_item_poll() should be holding the
> ep->mtx, so eventpoll_release() -> eventpoll_release_file_file() ->
> mutex_lock(&ep->mtx) should block and the file doesn't actually get
> released.

So I know you've seen it yourself but for my own peace of mind I've said
that in the other mail and in the other thread already that all callers
of ep_item_poll() do already hold the ep->mtx:

do_epoll_ctl()
-> epoll_mutex_lock(&ep->mtx)
-> ep_insert()
   -> ep_item_poll()

do_epoll_ctl()
-> epoll_mutex_lock(&ep->mtx)
-> ep_modify()
   -> ep_item_poll()

ep_send_events()
-> mutex_lock(&ep->mtx)
-> ep_item_poll()

/* nested; and all callers of ep_item_poll() already hold ep->mtx */
__ep_eventpoll_poll()
-> mutex_lock_nested(&ep->mtx, wait)
-> ep_item_poll()

So it's simply not possible to end up with a UAF in f_op->poll() because
eventpoll_release_file_file() serializes on ep->mtx as well:

__fput()
-> eventpoll_release()
   -> eventpoll_release_file()
      {
              // @file->f_count is zero _but file is not freed_
              // so taking file->f_lock is absolutely fine
              spin_lock(&file->f_lock);
              // mark as dying

              // serialzed on ep->mtx
              mutex_lock(&ep->mtx);
              __ep_rmove(ep, epi);
              ...

      }
      -> mutex_lock(&ep->mtx)

-> f_op->release()
-> kfree(file)

So afaict it's simply not possible to end up with a UAF in f_op->poll().

And I agree with you that for some instances it's valid to take another
reference to a file from f_op->poll() but then they need to use
get_file_active() imho and simply handle the case where f_count is zero.
And we need to document that in Documentation/filesystems/file.rst or
locking.rst.

But if it's simply just dma buf that cares about that long-term
reference then really we should just force them to take the reference
like I suggested but don't penalize everyone else. When I took a glance
at all f_op->poll() implementations I didn't spot one that did take
extra references.

But if you absolutely want to have epoll take the reference before it
calls into f_op->poll() that's fine with me as well. But we might end up
forcing epoll to do a lot of final fput()s which I'm not sure is all
that desirable.



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux