On 2024-02-11 03:58, Qu Wenruo wrote: > > > On 2024/2/11 01:04, Celeste Liu wrote: >> On 3/2/23 09:54, Qu Wenruo wrote: >>> [BUG] >>> During my scrub rework, I did a stupid thing like this: >>> >>> bio->bi_iter.bi_sector = stripe->logical; >>> btrfs_submit_bio(fs_info, bio, stripe->mirror_num); >>> >>> Above bi_sector assignment is using logical address directly, which >>> lacks ">> SECTOR_SHIFT". >>> >>> This results a read on a range which has no chunk mapping. >>> >>> This results the following crash: >>> >>> BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536 >>> assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387 >>> ------------[ cut here ]------------ >>> >>> Sure this is all my fault, but this shows a possible problem in real >>> world, that some bitflip in file extents/tree block can point to >>> unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT >>> is not configured, cause invalid pointer. >>> >>> [PROBLEMS] >>> In above call chain, we just don't handle the possible error from >>> btrfs_get_chunk_map() inside __btrfs_map_block(). >>> >>> [FIX] >>> The fix is pretty straightforward, just replace the ASSERT() with proper >>> error handling. >>> >>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> >>> --- >>> Changelog: >>> v2: >>> - Rebased to latest misc-next >>> The error path in bio.c is already fixed, thus only need to replace >>> the ASSERT() in __btrfs_map_block(). >>> --- >>> fs/btrfs/volumes.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index 4d479ac233a4..93bc45001e68 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -6242,7 +6242,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, >>> return -EINVAL; >>> >>> em = btrfs_get_chunk_map(fs_info, logical, *length); >>> - ASSERT(!IS_ERR(em)); >>> + if (IS_ERR(em)) >>> + return PTR_ERR(em); >>> >>> map = em->map_lookup; >>> data_stripes = nr_data_stripes(map); >> >> This bug affects 6.1.y LTS branch but no backport commit of this fix in >> 6.1.y branch. Please include it. Thanks. >> > Just want to make sure, are you hitting the ASSERT()? No, in non-debug build, ASSERT() is noop. So em will be a pointer whose value is errno and there is no any error handle for it. And then it trigger kernel NULL Pointer Dereference exception in btrfs_get_io_geometry() with em->map_lookup (Of course, it's not 0 but 0x5a (-EINVAL + offsetof(map_lookup)). But it still is a illegal memory access), this kernel thread was killed but the lock hold by this thread are not released. After that, all fs operations was block by the lock. > > If so, please provide the calltrace and btrfs-check output to pin down > the cause. It doesn't happen on my machine, I've notified the finder to send a backtrace to the mailing list > Just backporting the patch is only to make it a little more graceful, > but won't solve the root problem. No. In non debug build, ASSERT() is noop so it lead that there is no any error handle code. There is no RAII/SBRM, unexpected exit will lead the resources will not be released correctly. If it has unique holder (e.g. mutex), it will break all other code that need this resource! > > Thanks, > Qu