On 2023/7/11 17:39, Christian Brauner wrote:
On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote:
On 2023/7/10 22:12, Christian Brauner wrote:
On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@xxxxxxxxxxx wrote:
From: Wen Yang <wenyang.linux@xxxxxxxxxxx>
For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.
Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
Signed-off-by: Wen Yang <wenyang.linux@xxxxxxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>
Cc: Christian Brauner <brauner@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Dylan Yudaken <dylany@xxxxxx>
Cc: David Woodhouse <dwmw@xxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
So this looks ok but I would like to see an analysis how the overflow
can happen. I'm looking at the callers and it seems that once ctx->count
hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
there a caller that can call directly or indirectly
eventfd_ctx_do_read() on a ctx->count == 0?
eventfd_read() ensures that ctx->count is not 0 before calling
eventfd_ctx_do_read() and it is correct.
But it is not appropriate for eventfd_ctx_remove_wait_queue() to call
eventfd_ctx_do_read() unconditionally,
as it may not only causes ctx->count to overflow, but also unnecessarily
calls wake_up_locked_poll().
I am sorry for just adding the following string in the patch:
Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
Looking forward to your suggestions.
--
Best wishes,
Wen
I'm just slightly skeptical about patches that fix issues without an
analysis how this can happen.
fs/eventfd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8aa36cd37351..10a101df19cd 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
{
lockdep_assert_held(&ctx->wqh.lock);
- *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
+ *cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
ctx->count -= *cnt;
}
EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
@@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
return -EFAULT;
if (ucnt == ULLONG_MAX)
return -EINVAL;
+ if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
+ return -EINVAL;
Hm, why is bit necessary though? What's wrong with specifying ucnt == 0
with EFD_SEMAPHORE? This also looks like a (very low potential) uapi
break.
Thank you for your careful review.
As you pointed out, this may break the uapi.
Although manaul has the following description (man 2 eventfd):
* The file descriptor is readable (the select(2) readfds argument; the
poll(2) POLLIN flag) if the counter has a value greater than 0.
* The file descriptor is writable (the select(2) writefds argument; the
poll(2) POLLOUT flag) if it is possible to write a value of at least "1"
without blocking.
But it does not specify that the ucnt cannot be zero, so we can only
delete the two lines of code above
Could we propose another patch specifically to address the issue you
have identified?
Since there are indeed some corner scenes when ucnt is 0 and ctx->count
is also 0:
static ssize_t eventfd_write(struct file *file, const char __user *buf,
size_t count,
loff_t *ppos)
{
...
if (ULLONG_MAX - ctx->count > ucnt)
res = sizeof(ucnt); ---> always > 0
...
if (likely(res > 0)) {
ctx->count += ucnt; ---> unnecessary
addition of 0
current->in_eventfd = 1; ---> May affect
eventfd_signal()
if (waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, EPOLLIN); --->
heavyweight wakeup
current->in_eventfd = 0;
}
...
}
static __poll_t eventfd_poll(struct file *file, poll_table *wait)
{
...
count = READ_ONCE(ctx->count);
if (count > 0)
events |= EPOLLIN; ---> If count is 0, all previous
operations are wasted
if (count == ULLONG_MAX)
events |= EPOLLERR;
if (ULLONG_MAX - 1 > count)
events |= EPOLLOUT;
return events;
}
Could we optimize it like this?
static ssize_t eventfd_write(struct file *file, const char __user *buf,
size_t count,
loff_t *ppos)
{
...
if (likely(res > 0)) {
ctx->count += ucnt;
if (ctx->count) { ---> avoiding
unnecessary heavyweight operations
current->in_eventfd = 1;
if (waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, EPOLLIN);
current->in_eventfd = 0;
}
}
...
}
--
Best wishes,
Wen