[Bug 217769] New: XFS crash on mount on kernels >= 6.1

bugzilla-daemon@xxxxxxxxxx · Mon, 07 Aug 2023 16:35:35 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=217769

            Bug ID: 217769
           Summary: XFS crash on mount on kernels >= 6.1
           Product: File System
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: XFS
          Assignee: filesystem_xfs@xxxxxxxxxxxxxxxxxxxxxx
          Reporter: xani666@xxxxxxxxx
        Regression: No

Created attachment 304789
  --> https://bugzilla.kernel.org/attachment.cgi?id=304789&action=edit
full boot sequence where XFS fails

Background:

I've seen that happen on few machines now, both consumer (my personal laptop)
and enterprise (VMs on server with ECC memory) so I think any kind of disk
corruption could be ruled out here.

Common things between the machines is that all of them were on Debian,
dist-upgraded multiple times. IIRC the filesystem was originally formatted on
4.9 kernel. So basically XFS was formatted long time ago:

xfs_db> version
versionnum [0xb4b4+0x8a] =
V4,NLINK,DIRV2,ATTR,ALIGN,LOGV2,EXTFLG,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT

xfs_info /dev/vda2 
meta-data=/dev/vda2              isize=256    agcount=4, agsize=624064 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0, rmapbt=0
         =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
data     =                       bsize=4096   blocks=2496256, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

And now *some* of them started to crash near-immediately on boot. After
upgrading to Debian's 6.1 I get the crash, booting the old version (either
5,17, 5.10 or 4.19) works fine. I also tried 6.5~rc4 from experimental to the
same effect, and 6.3 on my laptop (where it only crashes one of the partitions,
as in attachment

The other interesting part is that the server mentioned in logs have a HA pair,
that was formatted at the same date, used near-same but doesn't (so far) have
same behaviour.

In both cases XFS complains about in-memory corruption but I don't believe
that's the case, aside of one of the systems having ECC memory (and dozen other
VMs running without problem) I also memtested them  for ~hour just to be sure.
I also booted that VM on other machine with same effect.

The log in question: 

[   16.115185] ------------[ cut here ]------------
[   16.115856] WARNING: CPU: 2 PID: 646 at fs/xfs/xfs_inode.c:1831
xfs_iunlink+0x165/0x1e0 [xfs]
[   16.118200] Modules linked in: tcp_diag inet_diag binfmt_misc ext4
ghash_clmulni_intel sha512_ssse3 sha512_generic crc16 mbcache jbd2 aesni_intel
crypto_simd cryptd cirrus drm_shmem_helper drm_kms_helper i6300esb
virtio_balloon watchdog pcspkr button joydev evdev serio_raw loop fuse drm
efi_pstore dm_mod configfs qemu_fw_cfg virtio_rng rng_core ip_tables x_tables
autofs4 hid_generic usbhid hid xfs libcrc32c crc32c_generic ata_generic
virtio_net virtio_blk net_failover failover uhci_hcd ata_piix crct10dif_pclmul
crct10dif_common ehci_hcd libata virtio_pci crc32_pclmul floppy crc32c_intel
virtio_pci_legacy_dev virtio_pci_modern_dev virtio scsi_mod usbcore psmouse
scsi_common virtio_ring usb_common i2c_piix4
[   16.129279] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Not tainted 6.5.0-0-amd64
#1  Debian 6.5~rc4-1~exp1
[   16.129290] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1
01/01/2007
[   16.129293] RIP: 0010:xfs_iunlink+0x165/0x1e0 [xfs]
[   16.134173] Code: 89 4c 24 04 72 2d e8 ea 5f f1 d8 8b 74 24 04 48 8d bd e0
00 00 00 e8 8a 40 94 d9 48 89 c3 48 85 c0 74 07 48 83 78 20 00 75 26 <0f> 0b e8
24 9c f1 d8 eb 13 f3 0f 1e fa 48 c7 c6 be fe 64 c0 4c 89
[   16.134177] RSP: 0018:ffffae6740c67b60 EFLAGS: 00010246
[   16.134189] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000006
[   16.134192] RDX: 0000000000000000 RSI: ffff9810817bfd98 RDI:
0000000000088b40
[   16.134194] RBP: ffff9810809cd800 R08: ffff9810817bff28 R09:
ffff9810809cd8e0
[   16.134196] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff9810861e2ae0
[   16.134199] R13: 00000000000101c0 R14: ffff981081642900 R15:
ffff981084217c00
[   16.146474] FS:  00007f89734e06c0(0000) GS:ffff981119d00000(0000)
knlGS:0000000000000000
[   16.146478] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.146480] CR2: 00007f8972b04000 CR3: 00000001024a6002 CR4:
00000000000206e0
[   16.146493] Call Trace:
[   16.146533]  <TASK>
[   16.146536]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.153113]  ? __warn+0x81/0x130
[   16.153193]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.154796]  ? report_bug+0x191/0x1c0
[   16.155433]  ? handle_bug+0x3c/0x80
[   16.155468]  ? exc_invalid_op+0x17/0x70
[   16.155472]  ? asm_exc_invalid_op+0x1a/0x20
[   16.155498]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.159010]  xfs_rename+0xaf9/0xe50 [xfs]
[   16.159285]  xfs_vn_rename+0xfe/0x170 [xfs]
[   16.161080]  ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[   16.161137]  vfs_rename+0xb7e/0xd40
[   16.163241]  ? do_renameat2+0x57a/0x5f0
[   16.163256]  do_renameat2+0x57a/0x5f0
[   16.163277]  __x64_sys_rename+0x43/0x50
[   16.165707]  do_syscall_64+0x60/0xc0
[   16.165740]  ? do_syscall_64+0x6c/0xc0
[   16.165746]  ? do_syscall_64+0x6c/0xc0
[   16.165749]  ? do_syscall_64+0x6c/0xc0
[   16.165754]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   16.165776] RIP: 0033:0x7f89b7277997
[   16.170373] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00
b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8
[   16.170377] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX:
0000000000000052
[   16.170382] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX:
00007f89b7277997
[   16.170384] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI:
00007f8972b5fb30
[   16.170386] RBP: 00007f8975544480 R08: 0000000000000009 R09:
0000000000000fa0
[   16.170388] R10: 0000000000000000 R11: 0000000000000246 R12:
00007f89734dfca0
[   16.170399] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15:
00007f8975544480
[   16.170407]  </TASK>
[   16.170408] ---[ end trace 0000000000000000 ]---
[   16.170419] XFS (vda2): Internal error xfs_trans_cancel at line 1104 of file
fs/xfs/xfs_trans.c.  Caller xfs_rename+0x613/0xe50 [xfs]
[   16.201330] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Tainted: G        W       
  6.5.0-0-amd64 #1  Debian 6.5~rc4-1~exp1
[   16.201335] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1
01/01/2007
[   16.201337] Call Trace:
[   16.201377]  <TASK>
[   16.201382]  dump_stack_lvl+0x47/0x60
[   16.201412]  xfs_trans_cancel+0x131/0x150 [xfs]
[   16.214058]  xfs_rename+0x613/0xe50 [xfs]
[   16.218956]  xfs_vn_rename+0xfe/0x170 [xfs]
[   16.218956]  ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[   16.222083]  vfs_rename+0xb7e/0xd40
[   16.222083]  ? do_renameat2+0x57a/0x5f0
[   16.222083]  do_renameat2+0x57a/0x5f0
[   16.222083]  __x64_sys_rename+0x43/0x50
[   16.222083]  do_syscall_64+0x60/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   16.222083] RIP: 0033:0x7f89b7277997
[   16.222083] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00
b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8
[   16.222083] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX:
0000000000000052
[   16.222083] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX:
00007f89b7277997
[   16.222083] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI:
00007f8972b5fb30
[   16.222083] RBP: 00007f8975544480 R08: 0000000000000009 R09:
0000000000000fa0
[   16.222083] R10: 0000000000000000 R11: 0000000000000246 R12:
00007f89734dfca0
[   16.222083] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15:
00007f8975544480
[   16.222083]  </TASK>
[   16.223078] XFS (vda2): Corruption of in-memory data (0x8) detected at
xfs_trans_cancel+0x14a/0x150 [xfs] (fs/xfs/xfs_trans.c:1105).  Shutting down
filesystem.
[   16.223941] systemd-journald[251]:
/var/log/journal/48d6a6734183b423cc1f60686f4553bb/system.journal: IO error,
rotating.
[   16.224548] XFS (vda2): Please unmount the filesystem and rectify the
problem(s)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.