On 1/17/24 5:43 PM, Bart Van Assche wrote: > On 1/17/24 13:40, Jens Axboe wrote: >> On 1/17/24 2:33 PM, Bart Van Assche wrote: >>> Please note that whether or not spin_trylock() is used, there is a >>> race condition in this approach: if dd_dispatch_request() is called >>> just before another CPU calls spin_unlock() from inside >>> dd_dispatch_request() then some requests won't be dispatched until the >>> next time dd_dispatch_request() is called. >> >> Sure, that's not surprising. What I cared most about here is that we >> should not have a race such that we'd stall. Since we haven't returned >> this request just yet if we race, we know at least one will be issued >> and we'll re-run at completion. So yeah, we may very well skip an issue, >> that's well known within that change, which will be postponed to the >> next queue run. >> >> The patch is more to demonstrate that it would not take much to fix this >> case, at least, it's a proof-of-concept. > > The patch below implements what has been discussed in this e-mail > thread. I do not recommend to apply this patch since it reduces single- No, it implements a suggestion that you had, it had nothing to do with what I suggested. > threaded performance by 11% on an Intel Xeon Processor (Skylake, IBRS): Not sure why you are even bothering sending a patch that makes things _worse_ when the whole point is to reduce contention here. You added another lock, and on top of that, you added code that now just bangs on dispatch if it's busy already. I already gave you a decent starting point with a patch that actually reduces contention, no idea what this thing is. -- Jens Axboe