On Mon, Jul 10, 2023 at 03:55:17PM -0700, Luis Chamberlain wrote: > On Mon, Jul 10, 2023 at 02:02:44PM +0100, Matthew Wilcox (Oracle) wrote: > > Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries > > to improve worst-case latency. Unfortunately, this had the effect of > > limiting the performance of: > > > > fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 \ > > -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 \ > > -numjobs=4 -directory=/mnt/test > > When you say performance, do you mean overall throughput / IOPS / > latency or all? This is buffered I/O, so when we run out of RAM, we block until the dirty pages are written back. I suppose that makes it throughput, but it's throughput from the bottom of the page cache to storage, not the usual app-to-page-cache bottleneck. > And who noticed it / reported it? The above incantation seems pretty > specific so I'm curious who runs that test and what sort of work flow > is it trying to replicate. Wang Yugui, who is on the cc reported it. https://lore.kernel.org/linux-xfs/20230508172406.1CF3.409509F4@xxxxxxxxxxxx/ > > The problem ends up being lock contention on the i_pages spinlock as we > > clear the writeback bit on each folio (and propagate that up through > > the tree). By using larger folios, we decrease the number of folios > > to be processed by a factor of 256 for this benchmark, eliminating the > > lock contention. > > Implied here seems to suggest that the associated cost for the search a > larger folio is pretty negligable compared the gains of finding one. > That seems to be nice but it gets me wondering if there are other > benchmarks under which there is any penalties instead. > > Ie, is the above a microbenchmark where this yields good results? It happens to be constructed in such a way that it yields the optimum outcome in this case, but that clearly wasn't deliberate. And the solution is the one I had in mind from before the bug report came in. I don't think you'll be able to find a benchmark that regresses as a result of this, but I welcome your attempt! > > It's also the right thing to do. This is a project that has been on > > the back burner for years, it just hasn't been important enough to do > > before now. > > Commit ebb7fb1557b1 (xfs, iomap: limit individual ioend chain lengths in > writeback") dates back to just one year, and so it gets me wondering > how a project in the back burner for years now finds motivation for > just a one year old regression. > > What was the original motivation of the older project dating this > effort back to its inception? We should create larger folios on write. It just wasn't important enough to do before now. But now there's an actual user who cares.