On 2/15/23 11:53, Tudor Ambarus wrote:
On 2/15/23 11:46, Tudor Ambarus wrote:
Hi, Ted!
On 2/15/23 04:32, Theodore Ts'o wrote:
On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
Darrick J. Wong <djwong@xxxxxxxxxx> 于2023年1月4日周三 03:17写道:
On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
For 1k-block filesystems, the filesystem starts at block 1, not
block 0.
If start_fsb is 0, it will be bump up to s_first_data_block. Then
ext4_get_group_no_and_offset don't know what to do and return garbage
results (blockgroup 2^32-1). The underflow make index
exceed es->s_groups_count in ext4_get_group_info() and trigger the
BUG_ON.
Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k
block filesystems")
Link:
https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Reported-by: syzbot+6be2b977c89f79b6b153@xxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Jun Nie <jun.nie@xxxxxxxxxx>
---
fs/ext4/fsmap.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
index 4493ef0c715e..1aef127b0634 100644
--- a/fs/ext4/fsmap.c
+++ b/fs/ext4/fsmap.c
@@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb,
struct ext4_fsmap_head *head,
if (handlers[i].gfd_dev >
head->fmh_keys[0].fmr_device)
memset(&dkeys[0], 0, sizeof(struct
ext4_fsmap));
+ /*
+ * Re-check the range after above limit operation
and reject
+ * 1K fs on block 0 as fs should start block 1. */
+ if (dkeys[0].fmr_physical ==0 &&
dkeys[1].fmr_physical == 0)
+ continue;
...and if this filesystem has 4k blocks, and therefore *does* define a
block 0?
Yes, this is a real corner case test :-)
So I'm really nervous about this change. I don't understand the code;
and I don't understand how the reproducer works. I can certainly
reproduce it using the reproducer found here[1], but it seems to
require running multiple processes all creating loop devices and then
running FS_IOC_GETMAP.
[1]
https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
If I change the reproducer to just run the execute_one() once, it
doesn't trigger the bug. It seems to only trigger when you have
multiple processes all racing to create a loop device, mount the file
system, try running FS_IOC_GETMAP --- and then delete the loop device
without actually unmounting the file system. Which is **weird***.
I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
and that doesn't trigger it either.
And I don't understand the reply to Darrick's question about why it's
safe to add the check since for 4k block file systems, block 0 *is*
valid.
So if someone can explain to me what is going on here with this code
(there are too many abstractions and what's going on with keys is just
making my head hurt), *and* what the change actually does, and how to
reproduce the problem with a ***simple*** reproducer -- the syzbot
mess doesn't count, that would be great. But applying a change that I
don't understand to code I don't understand, to fix a reproducer which
I also doesn't understand, just doesn't make me feel comfortable.
Let me share what I understood until now. The low key is zeroed. The
high key is defined and uses a fmr_physical of value zero, which is
smaller than the first data block for the 1k-block ext4 fs (which starts
at offset 1024).
-> ext4_getfsmap_datadev()
keys[0].fmr_physical = 0, keys[1].fmr_physical = 0
bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256
start_fsb = keys[0].fmr_physical = 1, end_fsb =
keys[1].fmr_physical = 0
-> ext4_get_group_no_and_offset()
blocknr = 1, le32_to_cpu(es->s_first_data_block) =1
start_ag = 0, first_cluster = 0
->
blocknr = 0, le32_to_cpu(es->s_first_data_block) =1
end_ag = 4294967295, last_cluster = 8191
because of poor key validation we get a wrong end_ag which eventually
causes the BUG_ON.
Then there's a loop that stops when info->gfi_agno <= end_ag; that
will trigger the BUG_ON in ext4_get_group_info() as the group nr
exceeds EXT4_SB(sb)->s_groups_count)
-> ext4_mballoc_query_range()
-> ext4_mb_load_buddy()
-> ext4_mb_load_buddy_gfp()
-> ext4_get_group_info()
It's an out of bounds request and Darrick suggested to not return any
mapping for the byte range 0-1023 for the 1k-block filesystem. The
alternative would be to return -EINVAL when the high key starts at
fmr_phisical of value zero for the 1k-block fs.
In order to reproduce this one would have to create an 1k-block ext4 fs
and to pass a high key with fmr_physical of value zero, thus I would
expect to reproduce it with something like this:
xfs_io -c 'fsmap -d 0 0' /mnt/scratch
However when doing this I notice that in
xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value
zero, FS_IOC_GETFSMAP is called and then we receive no entries
(head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how
to reproduce the bug.
What I think it happens for the reproducer that I proposed, is that when
both {l, h}->fmr_device have value zero, the code exits early before
getting the fsmap:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/ext4/fsmap.c?h=v6.2-rc8#n691
Also, to my untrained fs eye it seems that the [-d|-l|-r] xfs_io's fsmap
options are intended only for XFS, as the {data, log, realtime} sections
are XFS specific. I wonder why "struct fs_path" from libfrog/paths.h is
not renamed to "struct xfs_path", it would have been less confusing.
It looks there's no support for xfs_io to query for a start and end
offset when asking for a fsmap on an ext4 fs. I'm checking how I can
extend the xfs_io fsmap ext4 support to validate my assumptions.
Cheers,
ta