Re: [PATCH next v1 2/2] io_uring: limit local tw done

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 21 Nov 2024 16:43:56 +0000

On 11/21/24 16:20, Jens Axboe wrote:
On 11/21/24 9:18 AM, Pavel Begunkov wrote:
On 11/21/24 15:22, Jens Axboe wrote:
On 11/21/24 8:15 AM, Jens Axboe wrote:
I'd rather entertain NOT using llists for this in the first place, as it
gets rid of the reversing which is the main cost here. That won't change
the need for a retry list necessarily, as I think we'd be better off
with a lockless retry list still. But at least it'd get rid of the
reversing. Let me see if I can dig out that patch... Totally orthogonal
to this topic, obviously.

It's here:

https://lore.kernel.org/io-uring/20240326184615.458820-3-axboe@xxxxxxxxx/

I did improve it further but never posted it again, fwiw.

io_req_local_work_add() needs a smp_mb() after unlock, see comments,
release/unlock doesn't do it.

Yep, current version I have adds a smp_mb__after_unlock_lock() for that.

I don't think it'd be correct. unlock_lock AFAIK is specifically
for unlock + lock, you have lock + unlock. And data you want to
synchronise is modified after the lock part. That'd need upgrading
the release semantics implied by the unlock to a full barrier.

I doubt there is a good way to optimise it. I doubt it'd give you
anything even if you replace store_release in spin_unlock with xchg()
and ignore the return, but you can probably ask Paul.

Will do some quick testing, but then also try the double cmpxchg on top
of that if supported.

--
Pavel Begunkov