Re: [PATCH] xfs: don't reuse busy extents on extent trim

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Wed, 11 Jan 2023 22:41:43 +0800

Hi,

On 2022/5/26 19:34, Amir Goldstein wrote:
On Tue, Feb 23, 2021 at 2:35 PM Brian Foster <bfoster@xxxxxxxxxx> wrote:

On Mon, Feb 22, 2021 at 10:27:45AM -0800, Darrick J. Wong wrote:
On Mon, Feb 22, 2021 at 10:34:42AM -0500, Brian Foster wrote:
Freed extents are marked busy from the point the freeing transaction
commits until the associated CIL context is checkpointed to the log.
This prevents reuse and overwrite of recently freed blocks before
the changes are committed to disk, which can lead to corruption
after a crash. The exception to this rule is that metadata
allocation is allowed to reuse busy extents because metadata changes
are also logged.

As of commit 97d3ac75e5e0 ("xfs: exact busy extent tracking"), XFS
has allowed modification or complete invalidation of outstanding
busy extents for metadata allocations. This implementation assumes
that use of the associated extent is imminent, which is not always
the case. For example, the trimmed extent might not satisfy the
minimum length of the allocation request, or the allocation
algorithm might be involved in a search for the optimal result based
on locality.

generic/019 reproduces a corruption caused by this scenario. First,
a metadata block (usually a bmbt or symlink block) is freed from an
inode. A subsequent bmbt split on an unrelated inode attempts a near
mode allocation request that invalidates the busy block during the
search, but does not ultimately allocate it. Due to the busy state
invalidation, the block is no longer considered busy to subsequent
allocation. A direct I/O write request immediately allocates the
block and writes to it.

...

Hi Brian,

This patch was one of my selected fixes to backport for v5.10.y.
It has a very scary looking commit message and the change seems
to be independent of any infrastructure changes(?).

The problem is that applying this patch to v5.10.y reliably reproduces
this buffer corruption assertion [*] with test xfs/076.

This happens on the kdevops system that is using loop devices over
sparse files inside qemu images. It does not reproduce on my small
VM at home.

Normally, I would just drop this patch from the stable candidates queue
and move on, but I thought you might be interested to investigate this
reliable reproducer, because maybe this system exercises an error
that is otherwise rare to hit.

It seemed weird to me that NOT reusing the extent would result in
data corruption, but it could indicate that reusing the extent was masking
the assertion and hiding another bug(?).

Can you think of another reason to explain the regression this fix
introduces to 5.10.y?

Do you care to investigate this failure or shall I just move on?

Thanks,
Amir.

[*]
: XFS (loop5): Internal error xfs_trans_cancel at line 954 of file
fs/xfs/xfs_trans.c.  Caller xfs_create+0x22f/0x590 [xfs]
: CPU: 3 PID: 25481 Comm: touch Kdump: loaded Tainted: G            E
    5.10.109-xfs-2 #8
: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
04/01/2014
: Call Trace:
:  dump_stack+0x6d/0x88
:  xfs_trans_cancel+0x17b/0x1a0 [xfs]
:  xfs_create+0x22f/0x590 [xfs]
:  xfs_generic_create+0x245/0x310 [xfs]
:  ? d_splice_alias+0x13a/0x3c0
:  path_openat+0xe3f/0x1080
:  do_filp_open+0x93/0x100
:  ? handle_mm_fault+0x148e/0x1690
:  ? __check_object_size+0x162/0x180
:  do_sys_openat2+0x228/0x2d0
:  do_sys_open+0x4b/0x80
:  do_syscall_64+0x33/0x80
:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
: RIP: 0033:0x7f36b02eff1e
: Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 e9 57 0d 00 8b 00
85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d
00 f0 ff ff 0
: RSP: 002b:00007ffe7ef6ca10 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
: RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f36b02eff1e
: RDX: 0000000000000941 RSI: 00007ffe7ef6ebfa RDI: 00000000ffffff9c
: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
: R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000002
: R13: 00007ffe7ef6ebfa R14: 0000000000000001 R15: 0000000000000001
: XFS (loop5): xfs_do_force_shutdown(0x8) called from line 955 of file
fs/xfs/xfs_trans.c. Return address = ffffffffc08f5764
: XFS (loop5): Corruption of in-memory data detected.  Shutting down filesystem
: XFS (loop5): Please unmount the filesystem and rectify the problem(s)

(...just for the record) We also encountered this issue but without commit
 xfs: don't reuse busy extents on extent trim
applied in our 5.10 codebase...

Need to find some time to look into that...

[  413.283300] XFS (loop1): Internal error xfs_trans_cancel at line 950 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x219/0x590 [xfs]
[  413.284295] CPU: 0 PID: 27484 Comm: touch Tainted: G            E     5.10.134-13.an8.x86_64 #1
[  413.284296] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
[  413.284297] Call Trace:
[  413.284314]  dump_stack+0x57/0x6e
[  413.284373]  xfs_trans_cancel+0xa3/0x110 [xfs]
[  413.284412]  xfs_create+0x219/0x590 [xfs]
[  413.284458]  xfs_generic_create+0x21f/0x2d0 [xfs]
[  413.284462]  path_openat+0xdee/0x1020
[  413.284464]  do_filp_open+0x80/0xd0
[  413.284467]  ? __check_object_size+0x16a/0x180
[  413.284469]  do_sys_openat2+0x207/0x2c0
[  413.284471]  do_sys_open+0x3b/0x60
[  413.284475]  do_syscall_64+0x33/0x40
[  413.284478]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  413.284481] RIP: 0033:0x7fe623920252
[  413.284482] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 55 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[  413.284483] RSP: 002b:00007ffd7a38ca70 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[  413.284485] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fe623920252
[  413.284486] RDX: 0000000000000941 RSI: 00007ffd7a38d79d RDI: 00000000ffffff9c
[  413.284487] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[  413.284488] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000001
[  413.284488] R13: 0000000000000001 R14: 00007ffd7a38d79d R15: 00007fe623bbf374
[  413.289856] XFS (loop1): xfs_do_force_shutdown(0x8) called from line 951 of file fs/xfs/xfs_trans.c. Return address = 000000003fe0b8ba
[  413.289858] XFS (loop1): Corruption of in-memory data detected.  Shutting down filesystem
[  413.290573] XFS (loop1): Please unmount the filesystem and rectify the problem(s)

Thanks,
Gao Xiang