Re: Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll on kernel 6.3

Eric Sandeen <sandeen@xxxxxxxxxxx> · Wed, 24 May 2023 21:15:55 -0500

On 5/2/23 2:14 PM, Mike Pastore wrote:
Hi folks,

I was playing around with some blockchain projects yesterday and had
some curious crashes while syncing blockchain databases on XFS
filesystems under kernel 6.3.

   * kernel 6.3.0 and 6.3.1 (ubuntu mainline)
   * w/ and w/o the discard mount flag
   * w/ and w/o -m crc=0
   * ironfish (nodejs) and ergo (jvm)

The hardware is as follows:

   * Asus PRIME H670-PLUS D4
   * Intel Core i5-12400
   * 32GB DDR4-3200 Non-ECC UDIMM

In all cases the filesystems were newly-created under kernel 6.3 on an
LVM2 stripe and mounted with the noatime flag. Here is the output of
the mkfs.xfs command (after reverting back to 6.2.14—which I realize
may not be the most helpful thing, but here it is anyway):

$ sudo lvremove -f vgtethys/ironfish
$ sudo lvcreate -n ironfish-L 10G -i2 vgtethys /dev/nvme[12]n1p3
   Using default stripesize 64.00 KiB.
   Logical volume "ironfish" created.
$ sudo mkfs.xfs -m crc=0 -m uuid=b4725d43-a12d-42df-981a-346af2809fad
-s size=4096 /dev/vgtethys/ironfish
meta-data=/dev/vgtethys/ironfish isize=256    agcount=16, agsize=163824 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=0        finobt=0, sparse=0, rmapbt=0
          =                       reflink=0    bigtime=0 inobtcount=0
data     =                       bsize=4096   blocks=2621184, imaxpct=25
          =                       sunit=16     swidth=32 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.

The applications crash with I/O errors. Here's what I see in dmesg:

May 01 18:56:59 tethys kernel: XFS (dm-28): Internal error bno + len >
gtbno at line 1908 of file fs/xfs/libxfs/xfs_alloc.c.  Caller
xfs_free_ag_extent+0x14e/0x950 [xfs]
May 01 18:56:59 tethys kernel: CPU: 2 PID: 48657 Comm: node Tainted: P

What proprietary module do you have loaded?

Does the problem reproduce without it?

-Eric