On Tue, May 02, 2023 at 02:14:34PM -0500, Mike Pastore wrote: > Hi folks, > > I was playing around with some blockchain projects yesterday and had > some curious crashes while syncing blockchain databases on XFS > filesystems under kernel 6.3. > > * kernel 6.3.0 and 6.3.1 (ubuntu mainline) > * w/ and w/o the discard mount flag > * w/ and w/o -m crc=0 > * ironfish (nodejs) and ergo (jvm) > > The hardware is as follows: > > * Asus PRIME H670-PLUS D4 > * Intel Core i5-12400 > * 32GB DDR4-3200 Non-ECC UDIMM > > In all cases the filesystems were newly-created under kernel 6.3 on an > LVM2 stripe and mounted with the noatime flag. Here is the output of > the mkfs.xfs command (after reverting back to 6.2.14—which I realize > may not be the most helpful thing, but here it is anyway): > > $ sudo lvremove -f vgtethys/ironfish > $ sudo lvcreate -n ironfish-L 10G -i2 vgtethys /dev/nvme[12]n1p3 > Using default stripesize 64.00 KiB. > Logical volume "ironfish" created. > $ sudo mkfs.xfs -m crc=0 -m uuid=b4725d43-a12d-42df-981a-346af2809fad > -s size=4096 /dev/vgtethys/ironfish > meta-data=/dev/vgtethys/ironfish isize=256 agcount=16, agsize=163824 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=0 finobt=0, sparse=0, rmapbt=0 > = reflink=0 bigtime=0 inobtcount=0 > data = bsize=4096 blocks=2621184, imaxpct=25 > = sunit=16 swidth=32 blks Stripe aligned allocation is enabled. Does the problem go away when you use mkfs.xfs -d noalign .... ? > The applications crash with I/O errors. Here's what I see in dmesg: > > May 01 18:56:59 tethys kernel: XFS (dm-28): Internal error bno + len > > gtbno at line 1908 of file fs/xfs/libxfs/xfs_alloc.c. Caller > xfs_free_ag_extent+0x14e/0x950 [xfs] /* * If this failure happens the request to free this * space was invalid, it's (partly) already free. * Very bad. */ if (XFS_IS_CORRUPT(mp, ltbno + ltlen > bno)) { error = -EFSCORRUPTED; goto error0; } That failure implies the btree records are corrupt in memory, possibly due to memory corruption from something outside the XFS code (e.g. use after free). > May 01 18:56:59 tethys kernel: CPU: 2 PID: 48657 Comm: node Tainted: P > OE 6.3.1-060301-generic #202304302031 The kernel being run has been tainted by out of tree proprietary drivers (a common source of memory corruption bugs in my experience). Can you reproduce this problem with an untainted kernel? .... > And here's what I see in dmesg after rebooting and attempting to mount > the filesystem to replay the log: > > May 01 21:34:15 tethys kernel: XFS (dm-35): Metadata corruption > detected at xfs_inode_buf_verify+0x168/0x190 [xfs], xfs_inode block > 0x1405a0 xfs_inode_buf_verify > May 01 21:34:15 tethys kernel: XFS (dm-35): Unmount and run xfs_repair > May 01 21:34:15 tethys kernel: XFS (dm-35): First 128 bytes of > corrupted metadata buffer: > May 01 21:34:15 tethys kernel: 00000000: 5b 40 e2 3a ae 52 a0 7a 17 1d That's not an inode buffer. It's not recognisable as XFS metadata at all, which indicates some other problem. Oh, this was from a test with "mkfs.xfs -m crc=0 ...", right? Please don't use "-m crc=0" - that format is deprecated partly because it has unfixable on-disk format recovery issues. One of those issues manifests as an inode recovery failure because the underlying inode buffer allocation/init does not get replayed correctly before we attempt to replay inode changes into the buffer (that has not be initialised).... i.e. one of those unfixable issues manifest exactly like the recovery failure being reported here. > Blockchain projects tend to generate pathological filesystem loads; > the sustained random write activity and constant (re)allocations must > be pushing on some soft spot here. There was a significant allocator infrastructure rewrite in 6.3. If running an untainted kernel on an unaligned, CRC enabled filesystem makes the problems go away, then it rules out known issues with the rewrite. Alternatively, if it is reproducable in a short time, you may be able to bisect the XFS changes that landed between 6.2 and 6.3 to find which change triggers the problem. > Reverting to kernel 6.2.14 and > recreating the filesystems seems to have resolved the issue—so far, at > least—but obviously this is less than ideal. If someone would be > willing to provide a targeted listed of desired artifacts I'd be happy > to boot back into kernel 6.3.1 to reproduce the issue and collect > them. Alternatively I can try to eliminate some variables (like LVM2, > potential hardware instabilities, etc.) and provide step-by-step > directions for reproducing the issue on another machine. If you can find a minimal reproducer, that would help a lot in diagnosing the issue. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx