Re: [PATCH v4 0/9] Create large folios in iomap buffered write path

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 11 Jul 2023 00:53:04 +0100

On Mon, Jul 10, 2023 at 03:55:17PM -0700, Luis Chamberlain wrote:
> On Mon, Jul 10, 2023 at 02:02:44PM +0100, Matthew Wilcox (Oracle) wrote:
> > Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries
> > to improve worst-case latency.  Unfortunately, this had the effect of
> > limiting the performance of:
> > 
> > fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 \
> >         -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 \
> >         -numjobs=4 -directory=/mnt/test
> 
> When you say performance, do you mean overall throughput / IOPS /
> latency or all?

This is buffered I/O, so when we run out of RAM, we block until the
dirty pages are written back.  I suppose that makes it throughput, but
it's throughput from the bottom of the page cache to storage, not the
usual app-to-page-cache bottleneck.

> And who noticed it / reported it? The above incantation seems pretty
> specific so I'm curious who runs that test and what sort of work flow
> is it trying to replicate.

Wang Yugui, who is on the cc reported it.
https://lore.kernel.org/linux-xfs/20230508172406.1CF3.409509F4@xxxxxxxxxxxx/

> > The problem ends up being lock contention on the i_pages spinlock as we
> > clear the writeback bit on each folio (and propagate that up through
> > the tree).  By using larger folios, we decrease the number of folios
> > to be processed by a factor of 256 for this benchmark, eliminating the
> > lock contention.
> 
> Implied here seems to suggest that the associated cost for the search a
> larger folio is pretty negligable compared the gains of finding one.
> That seems to be nice but it gets me wondering if there are other
> benchmarks under which there is any penalties instead.
> 
> Ie, is the above a microbenchmark where this yields good results?

It happens to be constructed in such a way that it yields the optimum
outcome in this case, but that clearly wasn't deliberate.  And the
solution is the one I had in mind from before the bug report came in.

I don't think you'll be able to find a benchmark that regresses as
a result of this, but I welcome your attempt!

> > It's also the right thing to do.  This is a project that has been on
> > the back burner for years, it just hasn't been important enough to do
> > before now.
> 
> Commit ebb7fb1557b1 (xfs, iomap: limit individual ioend chain lengths in
> writeback") dates back to just one year, and so it gets me wondering
> how a project in the back burner for years now finds motivation for
> just a one year old regression.
> 
> What was the original motivation of the older project dating this
> effort back to its inception?

We should create larger folios on write.  It just wasn't important
enough to do before now.  But now there's an actual user who cares.