On 11/3/24 5:06 PM, Jens Axboe wrote:
> On 11/3/24 5:01 PM, Keith Busch wrote:
>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>> problematic commit simply by browsing git log. As indicated above;
>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>> minimal reproducer, but can provide my large one (it is testing a
>>>> larger build process in a VM) if needed?there are some additional
>>>> details in the above-linked downstream bug report, though. I hope that
>>>> having identified the problematic commit is enough for someone with
>>>> more context to go off of. Happy to provide more information if
>>>> needed.
>>>
>>> Don't worry about not having a reproducer, having the backport commit
>>> pin pointed will do just fine. I'll take a look at this.
>>
>> I think stable is missing:
>>
>> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>
> I think you need to go back further than that, this one already
> unconditionally holds ->uring_lock around overflow flushing...
Took a look, it's this one:
commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
Date: Wed Apr 10 02:26:54 2024 +0100
io_uring: always lock __io_cqring_overflow_flush
Greg/stable, can you pick this one for 6.6-stable? It picks
cleanly.
For 6.1, which is the other stable of that age that has the backport,
the attached patch will do the trick.
With that, I believe it should be sorted. Hopefully that can make
6.6.60 and 6.1.116.
--
Jens Axboe
From 3f1c33f03386c481caf2044a836f3ca611094098 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence@xxxxxxxxx>
Date: Wed, 10 Apr 2024 02:26:54 +0100
Subject: [PATCH] io_uring: always lock __io_cqring_overflow_flush
Commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 upstream.
Conditional locking is never great, in case of
__io_cqring_overflow_flush(), which is a slow path, it's not justified.
Don't handle IOPOLL separately, always grab uring_lock for overflow
flushing.
Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@xxxxxxxxx
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
io_uring/io_uring.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f902b161f02c..92c1aa8f3501 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -593,6 +593,8 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
bool all_flushed;
size_t cqe_size = sizeof(struct io_uring_cqe);
+ lockdep_assert_held(&ctx->uring_lock);
+
if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
return false;
@@ -647,12 +649,9 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
bool ret = true;
if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
- /* iopoll syncs against uring_lock, not completion_lock */
- if (ctx->flags & IORING_SETUP_IOPOLL)
- mutex_lock(&ctx->uring_lock);
+ mutex_lock(&ctx->uring_lock);
ret = __io_cqring_overflow_flush(ctx, false);
- if (ctx->flags & IORING_SETUP_IOPOLL)
- mutex_unlock(&ctx->uring_lock);
+ mutex_unlock(&ctx->uring_lock);
}
return ret;
@@ -1405,6 +1404,8 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
int ret = 0;
unsigned long check_cq;
+ lockdep_assert_held(&ctx->uring_lock);
+
if (!io_allowed_run_tw(ctx))
return -EEXIST;
--
2.45.2