On Tue, Mar 18, 2025 at 03:27:48PM +1100, Dave Chinner wrote: > Yes, NOWAIT may then add an incremental performance improvement on > top for optimal layout cases, but I'm still not yet convinced that > it is a generally applicable loop device optimisation that everyone > wants to always enable due to the potential for 100% NOWAIT > submission failure on any given loop device..... Yes, I think this is a really good first step: 1) switch loop to use a per-command work_item unconditionally, which also has the nice effect that it cleans up the horrible mess of the per-blkcg workers. (note that this is what the nvmet file backend has always done with good result) 2) look into NOWAIT submission, especially for reads this should be a clear winner and probaby done unconditionally. For writes it might be a bit of a tradeoff if we expect the writes to allocate a lot, so we might want some kind of tunable for it.