Am 03.05.24 um 23:24 schrieb Linus Torvalds:
On Fri, 3 May 2024 at 14:11, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
What we need is
* promise that ep_item_poll() won't happen after eventpoll_release_file().
AFAICS, we do have that.
* ->poll() not playing silly buggers.
No. That is not enough at all.
Because even with perfectly normal "->poll()", and even with the
ep_item_poll() happening *before* eventpoll_release_file(), you have
this trivial race:
ep_item_poll()
->poll()
and *between* those two operations, another CPU does "close()", and
that causes eventpoll_release_file() to be called, and now f_count
goes down to zero while ->poll() is running.
So you do need to increment the file count around the ->poll() call, I feel.
Or, alternatively, you'd need to serialize with
eventpoll_release_file(), but that would need to be some sleeping lock
held over the ->poll() call.
As it is, dma_buf ->poll() is very suspicious regardless of that
mess - it can grab reference to file for unspecified interval.
I think that's actually much preferable to what epoll does, which is
to keep using files without having reference counts to them (and then
relying on magically not racing with eventpoll_release_file().
I think it's a very important detail that epoll does not take
real references. Otherwise an application level 'close()' on a socket
would not trigger a tcp disconnect, when an fd is still registered with
epoll.
I noticed that some parts of Samba currently rely on this when I tried
to convert tevent from epoll to IORING_OP_POLL_ADD (which takes a longer term reference)
And I guess there will be other applications also relying on the current epoll
behavior. That a closed fs automatically removes itself from epoll.
A short term reference just around ->poll() might be fine,
but please no reference via EPOLL_CTL_ADD.
Changing that can cause security problems in user space.
I haven't followed all details of this thread,
please ignore me if that's all clear already :-)
Thanks!
metze