Re: [PATCH] xfs: fix incorrect usage of xfs_btree_check_block

Guo Xuenan <guoxuenan@xxxxxxxxxxxxxxx> · Wed, 30 Nov 2022 10:51:51 +0800

On 2022/11/30 9:06, Darrick J. Wong wrote:
On Tue, Nov 29, 2022 at 11:52:57AM +0800, Guo Xuenan wrote:
Hi Darrick，

I asked my colleagues about email-sending problems.
I am not the only one with the problem :( so,my reply of you question here
also sent faild. With my colleague's advice, I change a new email address.
Hope from now on, my email can be received by mailing list :)
On 2022/11/18 3:07, Darrick J. Wong wrote:
On Thu, Nov 17, 2022 at 10:13:46AM -0600, Eric Sandeen wrote:
On 11/7/22 7:50 PM, Guo Xuenan wrote:
On 2022/11/8 0:58, Darrick J. Wong wrote:
On Thu, Nov 03, 2022 at 07:37:09PM +0800, Guo Xuenan wrote:
xfs_btree_check_block contains a tag XFS_ERRTAG_BTREE_CHECK_{L,S}BLOCK,
it is a fault injection tag, better not use it in the macro ASSERT.

Since with XFS_DEBUG setting up, we can always trigger assert by `echo 1
/sys/fs/xfs/${disk}/errortag/btree_chk_{s,l}blk`.
It's confusing and strange.
Please be more specific about how this is confusing or strange.
I meant in current code, the ASSERT will alway happen,when we
`echo 1 > /sys/fs/xfs/${disk}/errortag/btree_chk_{s,l}blk`.
xfs_btree_islastblock
    ->ASSERT(block && xfs_btree_check_block(cur, block, level, bp) == 0);
      ->xfs_btree_check_{l/s}block
        ->XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BTREE_CHECK_{S,L}BLOCK)
we can use error injection to trigger this ASSERT.
Hmmm...

I think ASERRT macro and error injection are to find some effective problems,
not to create some kernel panic.
You can avoid a panic by turning XFS_ASSERT_FATAL off in Kconfig, or
at runtime by setting fs.xfs.bug_on_assert to 0, but ...

So, putting the error injection function in
ASSERT is a little strange.
Ok, so I think the argument is that in the default config, setting this error
injection tag will immediately result in a system panic, which probably isn't
what we want.  Is my understanding correct?
Yes, Eric, you are right, that's just what i mean.
But in the bigger picture, isn't this:

ASSERT(block && xfs_btree_check_block(cur, block, level, bp) == 0);

putting a disk corruption check into an ASSERT? That in itself seems a bit
suspect.  However, the ASSERT was all introduced in:

commit 27d9ee577dccec94fb0fc1a14728de64db342f86
Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Date:   Wed Nov 6 08:47:09 2019 -0800

      xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock
      Coverity points out that xfs_btree_islastblock doesn't check the return
      value of xfs_btree_check_block.  Since the question "Does the cursor
      point to the last block in this level?" only makes sense if the caller
      previously performed a lookup or seek operation, the block should
      already have been checked.
      Therefore, check the return value in an ASSERT and turn the whole thing
      into a static inline predicate.
      Coverity-id: 114069
      Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
      Reviewed-by: Christoph Hellwig <hch@xxxxxx>

which seems to imply that we really should not get here with a corrupt block during
normal operation.
It did, xfs_alloc_ag_vextent_near->xfs_btree_islastblock, see below for more
details.
Perhaps the error tag can get set after the block "should already have been checked"
but before this test in the ASSERT?
What I want to know here is *how* do we get to the point of tripping
this assertion via debugknob?  Won't the lookup or seek operation have
already checked the block and failed with EFSCORRUPTED?  And shouldn't
that be enough to stop whatever code calls xfs_btree_islastblock?  If
not, how do we get there?

Seriously, I don't want to burn more time discussing where and how to
fail on debugging knobs when there are all these other **far more
serious** corruption and deadlock problems that people are trying to get
merged.

Tell me specifically how to make the system fail.  "It's confusing and
strange" is not good enough.
Sorry,It's my fault. I didn't explain it clearly enough,and I don't think
this is a serious problem either. The background of the matter is we have
some testcases to test xfs, some of them are fault inject cases,and make
our testcase failed.

As Eric pointed out "You can avoid a panic by turning XFS_ASSERT_FATAL off in Kconfig"
It's just a little strange to me,and always make our testcase failed.
I just report it and if you are not interested in this patch,I am not strongly request
this to make it merged.

Here is the *how*, show you the test code and call trace.

*This is the test case:*
function do_test()
{
	echo "--- in do_test ---"
	_mount "$mount_options" $disk $mountpoint
	_check_fsstress $mountpoint # just do: fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null
	echo 1 > /sys/fs/xfs/sda/errortag/btree_chk_sblk
Ok, now we're finally getting somewhere.  You kick off fsstress, and
only *then* turn on the debugging knob.  If it happens that the knob
gets turned on after the cntbt lookup succeeds but before the call to
xfs_btree_islastblock, then we *can* end up in the situation where a
previously checked btree block suddenly starts returning EFSCORRUPTED
from xfs_btree_check_block.  Kaboom.

Looking back at commit 27d9ee577dcce, I think the point of all this was
to make sure that the cursor has actually performed a lookup, and that
the btree block at whatever level we're asking about is ok.

If the caller hasn't ever done a lookup, the bc_levels array will be
empty, so cur->bc_levels[level].bp pointer will be NULL.  The call to
xfs_btree_get_block will crash anyway, so the "ASSERT(block);" part is
pointless.

If the caller did a lookup but the lookup failed due to block
corruption, the corresponding cur->bc_levels[level].bp pointer will also
be NULL, and we'll still crash.  The "ASSERT(xfs_btree_check_block);"
logic is also unnecessary.

If the cursor level points to an inode root, the block buffer will be
incore, so it had better always be consistent.

If the caller ignores a failed lookup after a successful one and calls
this function, the cursor state is garbage and the assert wouldn't have
tripped anyway.

So get rid of the assert, and please edit the commit message to include
all these details.

OK，get :)

Best wishes
Xuenan
--D

	sleep $ERROR_TEST_RUN_TIME
}

round=0
do_pre
while [ $round -lt $round_max ]
do
	echo "******* round $round ********"
	do_test
	round=$(($round+1))
done

*This is the call trace:*
XFS (sda): fsstress should use fallocate; XFS_IOC_{ALLOC,FREE}SP ioctl
unsupported
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS (sda): Corruption of in-memory data (0x8) detected at
xfs_defer_finish_noroll+0x100c/0x19e0 (fs/xfs/libxfs/xfs_defer.c:573).
Shutting down filesystem.
XFS (sda): Please unmount the filesystem and rectify the problem(s)
xfs filesystem being mounted at /mnt/test supports timestamps until 2038
(0x7fffffff)
xfs filesystem being mounted at /mnt/test supports timestamps until 2038
(0x7fffffff)
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
XFS: Assertion failed: block && xfs_btree_check_block(cur, block, level, bp)
== 0, file: fs/xfs/libxfs/xfs_btree.h, line: 559
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
CPU: 0 PID: 30 Comm: kworker/u4:2 Not tainted 6.1.0-rc4-01347-geab06d36b0ac
#510
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34
04/01/2014
Workqueue: writeback wb_workfn (flush-8:0)
RIP: 0010:assfail+0x7a/0x8a
Code: 07 48 c1 e9 03 8a 14 11 38 c2 7f 10 84 d2 74 0c 48 c7 c7 b0 72 03 8c
e8 3d 45 d5 fd 80 3d f6 ab 4e 08 00 74 07 e8 df e4 91 fd <0f> 0b e8 d8 e4 91
fd 0f 0b 5b 5d 41 5c 41 5d c3 e8 ca e4 91 fd 48
RSP: 0018:ffff888010ab6ab8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffff84427ce0 RCX: 0000000000000000
RDX: ffff888010aa8040 RSI: ffffffff83b4c6c1 RDI: ffffed1002156d46
RBP: 0000000000000000 R08: 000000000000007d R09: ffff888010ab6777
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
R10: ffffed1002156cee R11: 0000000000000001 R12: ffffffff84427d20
R13: 000000000000022f R14: ffff888010ab6b88 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88805a400000(0000) knlGS:0000000000000000
XFS (sda): Injecting error (false) at file fs/xfs/libxfs/xfs_btree.c, line
243, on filesystem "sda"
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
XFS (sda): xfs_inactive_ifree: xfs_ifree returned error -117
CR2: 0000000001d192b8 CR3: 0000000017d0e000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <TASK>
  xfs_alloc_ag_vextent_near.constprop.0+0x945/0x13b0
  xfs_alloc_ag_vextent+0x7a6/0xdd0
  xfs_alloc_vextent+0x13d2/0x1cd0
  xfs_bmap_btalloc+0x778/0x1450
  xfs_bmapi_allocate+0x348/0x1260
  xfs_bmapi_convert_delalloc+0x669/0xd50
  xfs_map_blocks+0x4f7/0xc50
  iomap_do_writepage+0x785/0x1ed0
  write_cache_pages+0x39e/0xd60
  iomap_writepages+0x50/0xb0
  xfs_vm_writepages+0x128/0x1b0
  do_writepages+0x196/0x640
  __writeback_single_inode+0xae/0x8d0
  writeback_sb_inodes+0x4b8/0xc00
  __writeback_inodes_wb+0xc1/0x220
  wb_writeback+0x590/0x730
  wb_workfn+0x7a4/0xca0
  process_one_work+0x6d1/0xf40
  worker_thread+0x5b9/0xf60
  kthread+0x287/0x330
  ret_from_fork+0x1f/0x30
  </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:assfail+0x7a/0x8a
Code: 07 48 c1 e9 03 8a 14 11 38 c2 7f 10 84 d2 74 0c 48 c7 c7 b0 72 03 8c
e8 3d 45 d5 fd 80 3d f6 ab 4e 08 00 74 07 e8 df e4 91 fd <0f> 0b e8 d8 e4 91
fd 0f 0b 5b 5d 41 5c 41 5d c3 e8 ca e4 91 fd 48
RSP: 0018:ffff888010ab6ab8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffff84427ce0 RCX: 0000000000000000
RDX: ffff888010aa8040 RSI: ffffffff83b4c6c1 RDI: ffffed1002156d46
RBP: 0000000000000000 R08: 000000000000007d R09: ffff888010ab6777
R10: ffffed1002156cee R11: 0000000000000001 R12: ffffffff84427d20
R13: 000000000000022f R14: ffff888010ab6b88 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88805a400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001d192b8 CR3: 0000000017d0e000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled

---[ end Kernel panic - not syncing: Fatal exception ]---

Best wishes,
Xuenan
--D

-Eric