Re: [PATCH] eventfd: avoid overflow to ULLONG_MAX when ctx->count is 0

Wen Yang <wenyang.linux@xxxxxxxxxxx> · Wed, 12 Jul 2023 02:14:58 +0800

On 2023/7/11 17:39, Christian Brauner wrote:
On Mon, Jul 10, 2023 at 11:02:33PM +0800, Wen Yang wrote:
On 2023/7/10 22:12, Christian Brauner wrote:
On Sun, Jul 09, 2023 at 02:54:51PM +0800, wenyang.linux@xxxxxxxxxxx wrote:
From: Wen Yang <wenyang.linux@xxxxxxxxxxx>

For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling
eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX.

Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
Signed-off-by: Wen Yang <wenyang.linux@xxxxxxxxxxx>
Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>
Cc: Christian Brauner <brauner@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Dylan Yudaken <dylany@xxxxxx>
Cc: David Woodhouse <dwmw@xxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
So this looks ok but I would like to see an analysis how the overflow
can happen. I'm looking at the callers and it seems that once ctx->count
hits 0 eventfd_read() won't call eventfd_ctx_do_read() anymore. So is
there a caller that can call directly or indirectly
eventfd_ctx_do_read() on a ctx->count == 0?
eventfd_read() ensures that ctx->count is not 0 before calling
eventfd_ctx_do_read() and it is correct.

But it is not appropriate for eventfd_ctx_remove_wait_queue() to call
eventfd_ctx_do_read() unconditionally,

as it may not only causes ctx->count to overflow, but also unnecessarily
calls wake_up_locked_poll().


I am sorry for just adding the following string in the patch:
Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")


Looking forward to your suggestions.

--

Best wishes,

Wen


I'm just slightly skeptical about patches that fix issues without an
analysis how this can happen.

   fs/eventfd.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8aa36cd37351..10a101df19cd 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -189,7 +189,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
   {
   	lockdep_assert_held(&ctx->wqh.lock);
-	*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
+	*cnt = ((ctx->flags & EFD_SEMAPHORE) && ctx->count) ? 1 : ctx->count;
   	ctx->count -= *cnt;
   }
   EXPORT_SYMBOL_GPL(eventfd_ctx_do_read);
@@ -269,6 +269,8 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
   		return -EFAULT;
   	if (ucnt == ULLONG_MAX)
   		return -EINVAL;
+	if ((ctx->flags & EFD_SEMAPHORE) && !ucnt)
+		return -EINVAL;
Hm, why is bit necessary though? What's wrong with specifying ucnt == 0
with EFD_SEMAPHORE? This also looks like a (very low potential) uapi
break.


Thank you for your careful review.

As you pointed out, this may break the uapi.
Although manaul has the following description (man 2 eventfd):
*  The file descriptor is readable (the select(2) readfds argument; the 
poll(2) POLLIN flag) if the counter has a value greater than 0.
*  The file descriptor is writable (the select(2) writefds argument; the 
poll(2) POLLOUT flag) if it is possible to write a value of at least "1" 
without blocking.

But it does not specify that the ucnt cannot be zero, so we can only 
delete the two lines of code above

Could we propose another patch specifically to address the issue you 
have identified?

Since there are indeed some corner scenes when ucnt is 0 and ctx->count 
is also 0:


static ssize_t eventfd_write(struct file *file, const char __user *buf, 
size_t count,
                             loff_t *ppos)
{
...
        if (ULLONG_MAX - ctx->count > ucnt)
                res = sizeof(ucnt);                 ---> always > 0
...
        if (likely(res > 0)) {
                ctx->count += ucnt;                 ---> unnecessary 
addition of 0
                current->in_eventfd = 1;            ---> May affect 
eventfd_signal()
                if (waitqueue_active(&ctx->wqh))
                        wake_up_locked_poll(&ctx->wqh, EPOLLIN);  ---> 
heavyweight wakeup
                current->in_eventfd = 0;
        }
...
}


static __poll_t eventfd_poll(struct file *file, poll_table *wait)
{
...
        count = READ_ONCE(ctx->count);

        if (count > 0)
                events |= EPOLLIN;    ---> If count is 0, all previous 
operations are wasted

        if (count == ULLONG_MAX)
                events |= EPOLLERR;
        if (ULLONG_MAX - 1 > count)
                events |= EPOLLOUT;

        return events;
}

Could we optimize it like this?

static ssize_t eventfd_write(struct file *file, const char __user *buf, 
size_t count,
                             loff_t *ppos)
{
...
        if (likely(res > 0)) {
                ctx->count += ucnt;
                if (ctx->count) {                       ---> avoiding 
unnecessary heavyweight operations
                        current->in_eventfd = 1;
                        if (waitqueue_active(&ctx->wqh))
wake_up_locked_poll(&ctx->wqh, EPOLLIN);
                        current->in_eventfd = 0;
                }
        }
...
}


--

Best wishes,

Wen