On 11/21/24 9:00 AM, Pavel Begunkov wrote: > On 11/21/24 15:22, Jens Axboe wrote: >> On 11/21/24 8:15 AM, Jens Axboe wrote: >>> I'd rather entertain NOT using llists for this in the first place, as it >>> gets rid of the reversing which is the main cost here. That won't change >>> the need for a retry list necessarily, as I think we'd be better off >>> with a lockless retry list still. But at least it'd get rid of the >>> reversing. Let me see if I can dig out that patch... Totally orthogonal >>> to this topic, obviously. >> >> It's here: >> >> https://lore.kernel.org/io-uring/20240326184615.458820-3-axboe@xxxxxxxxx/ >> >> I did improve it further but never posted it again, fwiw. > It's nice that with sth like that we're not restricted by space and be > smarter about batching, e.g. splitting nr_tw into buckets. However, the > overhead of spinlock could be very hard if there is contention. With This is true, but it's also equally true for the llist - if you have contention on adding vs running, then you'll be bouncing the cacheline regardless. With the spinlock, you also have the added overhead of the IRQ disabling. > block it's more uniform which CPU tw comes from, but with network it > could be much more random. That's what Dylan measured back than, and > quite a similar situation that you've seen yourself before is with > socket locks. I'm sure he found the llist to be preferable, but that was also before we had to add the reversing. So not so clear cut anymore, may push it over the edge as I bet there was not much of a difference before. At least when I benchmarked this back in March, it wasn't like llist was a clear winner. > Another option is to try out how a lockless list (instead of stack) > with double cmpxchg would perform. I did ponder that and even talked to Paul about it as well, but I never benchmarked that one. I can try and resurrect that effort. One annoyance there is that we need arch support, but I don't think it's a big deal as the alternative is just to fallback to spinlock + normal list if it's not available. -- Jens Axboe