Re: [PATCH v2] btrfs: handle missing chunk mapping more gracefully

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-02-11 03:58, Qu Wenruo wrote:

> 
> 
> On 2024/2/11 01:04, Celeste Liu wrote:
>> On 3/2/23 09:54, Qu Wenruo wrote:
>>> [BUG]
>>> During my scrub rework, I did a stupid thing like this:
>>>
>>>           bio->bi_iter.bi_sector = stripe->logical;
>>>           btrfs_submit_bio(fs_info, bio, stripe->mirror_num);
>>>
>>> Above bi_sector assignment is using logical address directly, which
>>> lacks ">> SECTOR_SHIFT".
>>>
>>> This results a read on a range which has no chunk mapping.
>>>
>>> This results the following crash:
>>>
>>>    BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
>>>    assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387
>>>    ------------[ cut here ]------------
>>>
>>> Sure this is all my fault, but this shows a possible problem in real
>>> world, that some bitflip in file extents/tree block can point to
>>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
>>> is not configured, cause invalid pointer.
>>>
>>> [PROBLEMS]
>>> In above call chain, we just don't handle the possible error from
>>> btrfs_get_chunk_map() inside __btrfs_map_block().
>>>
>>> [FIX]
>>> The fix is pretty straightforward, just replace the ASSERT() with proper
>>> error handling.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>> ---
>>> Changelog:
>>> v2:
>>> - Rebased to latest misc-next
>>>     The error path in bio.c is already fixed, thus only need to replace
>>>     the ASSERT() in __btrfs_map_block().
>>> ---
>>>    fs/btrfs/volumes.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 4d479ac233a4..93bc45001e68 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>>>            return -EINVAL;
>>>
>>>        em = btrfs_get_chunk_map(fs_info, logical, *length);
>>> -    ASSERT(!IS_ERR(em));
>>> +    if (IS_ERR(em))
>>> +        return PTR_ERR(em);
>>>
>>>        map = em->map_lookup;
>>>        data_stripes = nr_data_stripes(map);
>>
>> This bug affects 6.1.y LTS branch but no backport commit of this fix in
>> 6.1.y branch. Please include it. Thanks.
>>
> Just want to make sure, are you hitting the ASSERT()?

No, in non-debug build, ASSERT() is noop. So em will be a pointer whose
value is errno and there is no any error handle for it. And then it
trigger kernel NULL Pointer Dereference exception in btrfs_get_io_geometry()
with em->map_lookup (Of course, it's not 0 but 0x5a
(-EINVAL + offsetof(map_lookup)). But it still is a illegal memory access),
this kernel thread was killed but the lock hold by this thread are not
released. After that, all fs operations was block by the lock.

> 
> If so, please provide the calltrace and btrfs-check output to pin down
> the cause.

It doesn't happen on my machine, I've notified the finder to send a
backtrace to the mailing list

> Just backporting the patch is only to make it a little more graceful,
> but won't solve the root problem.

No. In non debug build, ASSERT() is noop so it lead that there is no any
error handle code. There is no RAII/SBRM, unexpected exit will lead the
resources will not be released correctly. If it has unique holder (e.g. mutex),
it will break all other code that need this resource!

> 
> Thanks,
> Qu





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux