On Tue, Mar 18, 2025 at 05:34:28PM +0800, Ming Lei wrote: > On Tue, Mar 18, 2025 at 12:57:17AM -0700, Christoph Hellwig wrote: > > On Tue, Mar 18, 2025 at 03:27:48PM +1100, Dave Chinner wrote: > > > Yes, NOWAIT may then add an incremental performance improvement on > > > top for optimal layout cases, but I'm still not yet convinced that > > > it is a generally applicable loop device optimisation that everyone > > > wants to always enable due to the potential for 100% NOWAIT > > > submission failure on any given loop device..... > > NOWAIT failure can be avoided actually: > > https://lore.kernel.org/linux-block/20250314021148.3081954-6-ming.lei@xxxxxxxxxx/ That's a very complex set of heuristics which doesn't match up with other uses of it. > > > > > Yes, I think this is a really good first step: > > > > 1) switch loop to use a per-command work_item unconditionally, which also > > has the nice effect that it cleans up the horrible mess of the > > per-blkcg workers. (note that this is what the nvmet file backend has > > It could be worse to take per-command work, because IO handling crosses > all system wq worker contexts. So do other workloads with pretty good success. > > > always done with good result) > > per-command work does burn lots of CPU unnecessarily, it isn't good for > use case of container That does not match my observations in say nvmet. But if you have numbers please share them.