Re: [PATCH] xfs: fix incorrect usage of xfs_btree_check_block

Eric Sandeen <sandeen@xxxxxxxxxx> · Thu, 17 Nov 2022 10:13:46 -0600

On 11/7/22 7:50 PM, Guo Xuenan wrote:
> On 2022/11/8 0:58, Darrick J. Wong wrote:
>> On Thu, Nov 03, 2022 at 07:37:09PM +0800, Guo Xuenan wrote:
>>> xfs_btree_check_block contains a tag XFS_ERRTAG_BTREE_CHECK_{L,S}BLOCK,
>>> it is a fault injection tag, better not use it in the macro ASSERT.
>>>
>>> Since with XFS_DEBUG setting up, we can always trigger assert by `echo 1
>>>> /sys/fs/xfs/${disk}/errortag/btree_chk_{s,l}blk`.
>>> It's confusing and strange.
>> Please be more specific about how this is confusing or strange.
> I meant in current code, the ASSERT will alway happen,when we
> `echo 1 > /sys/fs/xfs/${disk}/errortag/btree_chk_{s,l}blk`.
> xfs_btree_islastblock
>   ->ASSERT(block && xfs_btree_check_block(cur, block, level, bp) == 0);
>     ->xfs_btree_check_{l/s}block
>       ->XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BTREE_CHECK_{S,L}BLOCK)
> we can use error injection to trigger this ASSERT.

Hmmm...

> I think ASERRT macro and error injection are to find some effective problems,
> not to create some kernel panic.

You can avoid a panic by turning XFS_ASSERT_FATAL off in Kconfig, or
at runtime by setting fs.xfs.bug_on_assert to 0, but ...

> So, putting the error injection function in
> ASSERT is a little strange.

Ok, so I think the argument is that in the default config, setting this error
injection tag will immediately result in a system panic, which probably isn't
what we want.  Is my understanding correct?

But in the bigger picture, isn't this:

ASSERT(block && xfs_btree_check_block(cur, block, level, bp) == 0);

putting a disk corruption check into an ASSERT? That in itself seems a bit
suspect.  However, the ASSERT was all introduced in:

commit 27d9ee577dccec94fb0fc1a14728de64db342f86
Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Date:   Wed Nov 6 08:47:09 2019 -0800

    xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock

    Coverity points out that xfs_btree_islastblock doesn't check the return
    value of xfs_btree_check_block.  Since the question "Does the cursor
    point to the last block in this level?" only makes sense if the caller
    previously performed a lookup or seek operation, the block should
    already have been checked.

    Therefore, check the return value in an ASSERT and turn the whole thing
    into a static inline predicate.

    Coverity-id: 114069
    Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

which seems to imply that we really should not get here with a corrupt block during
normal operation.

Perhaps the error tag can get set after the block "should already have been checked"
but before this test in the ASSERT?

-Eric