Re: [PATCH 0/5] fs/buffer: strack reduction on async read

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Mon, 30 Dec 2024 09:30:53 -0800

On Thu, Dec 19, 2024 at 03:51:34AM +0000, Matthew Wilcox wrote:
> So folio->mapping is NULL.
> 
> Ah, I see the problem.  end_buffer_async_read() uses the buffer_async_read
> test to decide if all buffers on the page are uptodate or not.  So both
> having no batch (ie this patch) and having a batch which is smaller than
> the number of buffers in the folio can lead to folio_end_read() being
> called prematurely (ie we'll unlock the folio before finishing reading
> every buffer in the folio).
> 
> Once the folio is unlocked, it can be truncated.  That's a second-order
> problem, but it's the one your test happened to hit.
> 
> This should fix the problem; we always have at least one BH held in
> the submission path with the async_read flag set, so
> end_buffer_async_read() will not end it prematurely.

Oh neat, yes.

> By the way, do you have CONFIG_VM_DEBUG enabled in your testing?

You mean DEBUG_VM ? Yes:

grep DEBUG_VM .config
CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
CONFIG_DEBUG_VM_IRQSOFF=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_MAPLE_TREE is not set
# CONFIG_DEBUG_VM_RB is not set
CONFIG_DEBUG_VM_PGFLAGS=y
# CONFIG_DEBUG_VM_PGTABLE is not set

>         VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> in folio_end_read() should have tripped before hitting the race with
> truncate.

Odd that it did not, I had run into that folio_test_locked() splat but in my
attempts to simplify this without your trick to only run into the similar
truncate race, your resolution to this is nice.

> diff --git a/fs/buffer.c b/fs/buffer.c

This is a nice resolution and simplification, thanks, I've tested it and
passes without regressions on ext4. I'll take this into this series as
an alternative.

  Luis