On Sat, May 20, 2023 at 05:36:00PM +0100, Matthew Wilcox (Oracle) wrote: > Wang Yugui has a workload which would be improved by using large folios. I think that's a bit of a misrepresentation of the situation. The workload in question has regressed from ~8GB/s to 2.5GB/s due to page cache structure contention caused by XFS limiting writeback bio chain length to bound worst case IO completion latency in 5.15. This causes several tasks attempting concurrent exclusive locking of the mapping tree: write(), writeback IO submission, writeback IO completion and multiple memory reclaim tasks (both direct and background). Limiting worse case latency means that IO completion is accessing the mapping tree much more frequently (every 16MB, instead of 16-32GB), and that has driven this workload into lock contention breakdown. This was shown in profiles indicating the regression was caused by page cache contention causing excessive CPU usage in the writeback flusher thread limiting IO submission rates. This is not something we can fix in XFS - it's a exclusive lock access issue in the page cache... Mitigating the performance regression by enabling large folios for XFS doesn't actually fix any of the locking problems, it just reduces lock traffic in the IO path by a couple of orders of magnitude. The problem will come back as bandwidth increases again. Also, the same problem will affect other filesystems that aren't capable of using large folios. Hence I think we really need to document this problem with the mitigations being proposed so that, in future, we know how to recognise when we hit these page cache limitations again. i.e. I think it would also be a good idea to include some of the analysis that pointed to the page cache contention problem here (either in the cover letter so it can be used as a merge tag message, or in a commit), rather than present it as "here's a performance improvement" without any context of what problem it's actually mitigating.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx