On Tue, Feb 04, 2025 at 06:06:42AM +0100, Christoph Hellwig wrote: > On Tue, Feb 04, 2025 at 01:50:08PM +1100, Dave Chinner wrote: > > I doubt that will create enough concurrency for a typical small > > server or desktop machine that only has a single NUMA node but has a > > couple of fast nvme SSDs in it. > > > > > 2) Fixed number of writeback contexts, say min(10, numcpu). > > > 3) NUMCPU/N number of writeback contexts. > > > > These don't take into account the concurrency available from > > the underlying filesystem or storage. > > > > That's the point I was making - CPU count has -zero- relationship to > > the concurrency the filesystem and/or storage provide the system. It > > is fundamentally incorrect to base decisions about IO concurrency on > > the number of CPU cores in the system. > > Yes. But as mention in my initial reply there is a use case for more > WB threads than fs writeback contexts, which is when the writeback I understand that - there's more that one reason we may want multiple IO dispatch threads per writeback context. > threads do CPU intensive work like compression. Being able to do that > from normal writeback threads vs forking out out to fs level threads > would really simply the btrfs code a lot. Not really interesting for > XFS right now of course. > > Or in other words: fs / device geometry really should be the main > driver, but if a file systems supports compression (or really expensive > data checksums) being able to scale up the numbes of threads per > context might still make sense. But that's really the advanced part, > we'll need to get the fs geometry aligned to work first. > > > > > XFS largely stem from the per-IO cpu overhead of block allocation in > > > > the filesystems (i.e. delayed allocation). > > > > > > This is a good idea, but it means we will not be able to paralellize within > > > an AG. > > > > The XFS allocation concurrency model scales out across AGs, not > > within AGs. If we need writeback concurrency within an AG (e.g. for > > pure overwrite concurrency) then we want async dispatch worker > > threads per writeback queue, not multiple writeback queues per > > AG.... > > Unless the computational work mentioned above is involved there would be > something really wrong if we're saturating a CPU per AG. ... and the common case I'm thinking of here is stalling writeback on the AGF lock. i.e. another non-writeback task holding the AGF lock - inode cluster allocation, directory block allocation, freeing space in that AG, etc - will stall any writeback that requires allocation in that AG if we don't have multiple dispatch threads per writeback context available. That, IMO, is future optimisation; just moving to a writeback context per AG would solve a lot of the current "single thread CPU bound" writeback throughput limitations we have right now, especially when it comes to writing lots of small files across many directories in a very short period of time (i.e. opening tarballs, rsync, recursive dir copies, etc). -Dave. -- Dave Chinner david@xxxxxxxxxxxxx