On 1/19/24 5:05 PM, Jens Axboe wrote: > On 1/19/24 4:16 PM, Bart Van Assche wrote: >> On 1/19/24 08:02, Jens Axboe wrote: >>> If we attempt to insert a list of requests, but someone else is already >>> running an insertion, then fallback to queueing that list internally and >>> let the existing inserter finish the operation. The current inserter >>> will either see and flush this list, of if it ends before we're done >>> doing our bucket insert, then we'll flush it and insert ourselves. >>> >>> This reduces contention on the dd->lock, which protects any request >>> insertion or dispatch, by having a backup point to insert into which >>> will either be flushed immediately or by an existing inserter. As the >>> alternative is to just keep spinning on the dd->lock, it's very easy >>> to get into a situation where multiple processes are trying to do IO >>> and all sit and spin on this lock. >> >> With this alternative patch I achieve 20% higher IOPS than with patch >> 3/4 of this series for 1..4 CPU cores (null_blk + fio in an x86 VM): > > Performance aside, I think this is a much better approach rather than > mine. Haven't tested yet, but I think this instead of my patch 3 and the > other patches and this should further drastically cut down on the > overhead. Can you send a "proper" patch and I'll just replace the one > that I have? Ran with this real quick and the incremental I sent, here's what I see. For reference, this is before the series: Device IOPS sys contention diff ==================================================== null_blk 879K 89% 93.6% nvme0n1 901K 86% 94.5% and now with the series: Device IOPS sys contention diff ==================================================== null_blk 2867K 11.1% ~6.0% +326% nvme0n1 3162K 9.9% ~5.0% +350% which looks really good, it removes the last bit of contention that was still there. And talk about a combined improvement... -- Jens Axboe