On Thu, Dec 19, 2024 at 03:51:34AM +0000, Matthew Wilcox wrote: > So folio->mapping is NULL. > > Ah, I see the problem. end_buffer_async_read() uses the buffer_async_read > test to decide if all buffers on the page are uptodate or not. So both > having no batch (ie this patch) and having a batch which is smaller than > the number of buffers in the folio can lead to folio_end_read() being > called prematurely (ie we'll unlock the folio before finishing reading > every buffer in the folio). > > Once the folio is unlocked, it can be truncated. That's a second-order > problem, but it's the one your test happened to hit. > > This should fix the problem; we always have at least one BH held in > the submission path with the async_read flag set, so > end_buffer_async_read() will not end it prematurely. Oh neat, yes. > By the way, do you have CONFIG_VM_DEBUG enabled in your testing? You mean DEBUG_VM ? Yes: grep DEBUG_VM .config CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y CONFIG_DEBUG_VM_IRQSOFF=y CONFIG_DEBUG_VM=y # CONFIG_DEBUG_VM_MAPLE_TREE is not set # CONFIG_DEBUG_VM_RB is not set CONFIG_DEBUG_VM_PGFLAGS=y # CONFIG_DEBUG_VM_PGTABLE is not set > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > in folio_end_read() should have tripped before hitting the race with > truncate. Odd that it did not, I had run into that folio_test_locked() splat but in my attempts to simplify this without your trick to only run into the similar truncate race, your resolution to this is nice. > diff --git a/fs/buffer.c b/fs/buffer.c This is a nice resolution and simplification, thanks, I've tested it and passes without regressions on ext4. I'll take this into this series as an alternative. Luis