Re: possible regression fs corruption on 64GB nvme

Keith Busch <kbusch@xxxxxxxxxx> · Mon, 9 Sep 2024 13:29:09 -0600

On Mon, Sep 09, 2024 at 07:34:15PM +0100, Robert Beckett wrote:
> After a lot of testing, we managed to get a repro case that would trigger within 2-3 tests using the desync tool [2], reducing the repro time from a day or more to minutes. For repro steps see [3].
> We bisected the issue to 
> 
> da9619a30e73b dmapool: link blocks across pages
> https://lore.kernel.org/all/20230126215125.4069751-12-kbusch@xxxxxxxx/T/#u

That's not the patch that was ultimately committed. Still, that's the
one I tested extensively with nvme, so the updated one shouldn't make a
difference for protocol.

> Some other thoughts about the issue:
> 
> - we have received reports of occasional filesystem corruptions on btrfs and ext4 filesystems on the same disk, this doesn't appear fs related
> - it only seems to affect these 64GB single queue simple disks. Other devices with more capable disks have not showed this issue.
> - using simple dd or md5sum testing does not sow the issue. desync seems to be very parallel in it's attack patterns.
> - I was investigating a previous potential regression that was deemed not an issue https://lkml.org/lkml/2023/2/21/762 . I assume nvme doesn't need it's addresses to be ordered. I'm not familiar with the spec.

nvme should not care about address ordering. The dma buffers are all
pulled from the same pool for all threads, and could be dispatched in
different orders than what was allocated, so any order should be fine.

> I'd appreciate any advice you may have on why this dmapool patch could potentially cause or expose an issue with these nvme devices.
> If any more info would be useful to help diagnose, I'll happily provide it.

Did you try with CONFIG_SLUB_DEBUG_ON enabled?