On Thu, Aug 08, 2024 at 10:47:59PM GMT, Zizhi Wo wrote: > In commit 63ef7a35912d ("xfs: fix interval filtering in multi-step fsmap > queries"), Darrick has solved a fsmap bug about incorrect filter condition. > But I still notice two problems in fsmap: > > [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt > EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL > 0: 253:32 [0..7]: static fs metadata 0 (0..7) 8 > 1: 253:32 [8..23]: per-AG metadata 0 (8..23) 16 > 2: 253:32 [24..39]: inode btree 0 (24..39) 16 > ...... > > Bug 1: > [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 3 7' /mnt > [root@fedora ~]# > Normally, we should be able to get [3, 7), but we got nothing. > > Bug 2: > [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 15 20' /mnt > EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL > 0: 253:32 [8..23]: per-AG metadata 0 (8..23) 16 > Normally, we should be able to get [15, 20), but we obtained a whole > segment of extent. > > The first problem is caused by shifting. When the query interval is before > the first extent which can be find in btree, no records can meet the > requirement. And the gap will be obtained in the last query. However, > rec_daddr is calculated based on the start_block recorded in key[1], which > is converted by calling XFS_BB_TO_FSBT. Then if rec_daddr does not exceed > info->next_daddr, which means keys[1].fmr_physical >> (mp)->m_blkbb_log > <= info->next_daddr, no records will be displayed. In the above example, > 3 >> (mp)->m_blkbb_log = 0 and 7 >> (mp)->m_blkbb_log = 0, so the two are > reduced to 0 and the gap is ignored: > > before calculate ----------------> after shifting > 3(st) 7(ed) 0(st/ed) > |---------| | > sector size block size > > Resolve this issue by introducing the "tail_daddr" field in > xfs_getfsmap_info. This records |key[1].fmr_physical + key[1].length| at > the granularity of sector. If the current query is the last, the rec_daddr > is tail_daddr to prevent missing interval problems caused by shifting. You mention the introduction of the 'tail_daddr' field, but your patch does not introduce such field. Your patch description should properly match your patch. > We > only need to focus on the last query, because xfs disks are internally > aligned with disk blocksize that are powers of two and minimum 512, so > there is no problem with shifting in previous queries. > > The second problem is that the resulting range is not truncated precisely > according to the boundary. Even though they are related, I'd prefer these two fixes split this into 2 separated patches, not a single one. This makes reviewers lives easier to follow what you are fixing. > Currently, the query display mechanism for owner > and missing_owner is different. The query of missing_owner (e.g. freespace > in rmapbt/ unknown space in bnobt) is obtained by subtraction (gap), which > can accurately lock the range. In the query of owner which almostly finded > by btree, as long as certain conditions met, the entire interval is > recorded, regardless of the starting address of the key[0] and key[1] > incoming from the user state. Focus on the following scenario: > > a b > |-------| > query > c d > |----------------|-------------|----------------| > missing owner1 owner missing owner2 > > Currently query is directly displayed as [c, d), the correct display should > be [a, b). This problem is solved by calculating max(a, c) and min(b, d) to > identify the head and tail of the range. To be able to determine the bounds > of the low key, "start_daddr" is introduced in xfs_getfsmap_info. Here you properly describe what your patch is doing. Carlos > Although > in some scenarios, similar results can be achieved without introducing > "start_daddr" and relying solely on info->next_daddr (e.g. in bnobt), it is > ineffective for overlapping scenarios in rmapbt. > > After applying this patch, both of the above issues have been fixed (the > same applies to boundary queries for the log device and realtime device): > 1) > [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 3 7' /mnt > EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL > 0: 253:32 [3..6]: static fs metadata 0 (3..6) 4 > 2) > [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 15 20' /mnt > EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL > 0: 253:32 [15..19]: per-AG metadata 0 (15..19) 5 > > Note that due to the current query range being more precise, high.rm_owner > needs to be handled carefully. When it is 0, set it to the maximum value to > prevent missing intervals in rmapbt. > > Signed-off-by: Zizhi Wo <wozizhi@xxxxxxxxxx> > --- > fs/xfs/xfs_fsmap.c | 42 ++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 40 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c > index 85dbb46452ca..e7bb21497e5c 100644 > --- a/fs/xfs/xfs_fsmap.c > +++ b/fs/xfs/xfs_fsmap.c > @@ -162,6 +162,8 @@ struct xfs_getfsmap_info { > xfs_daddr_t next_daddr; /* next daddr we expect */ > /* daddr of low fsmap key when we're using the rtbitmap */ > xfs_daddr_t low_daddr; > + xfs_daddr_t start_daddr; /* daddr of low fsmap key */ > + xfs_daddr_t end_daddr; /* daddr of high fsmap key */ > u64 missing_owner; /* owner of holes */ > u32 dev; /* device id */ > /* > @@ -276,6 +278,7 @@ xfs_getfsmap_helper( > struct xfs_mount *mp = tp->t_mountp; > bool shared; > int error; > + int trunc_len; > > if (fatal_signal_pending(current)) > return -EINTR; > @@ -283,6 +286,13 @@ xfs_getfsmap_helper( > if (len_daddr == 0) > len_daddr = XFS_FSB_TO_BB(mp, rec->rm_blockcount); > > + /* > + * Determine the maximum boundary of the query to prepare for > + * subsequent truncation. > + */ > + if (info->last && info->end_daddr) > + rec_daddr = info->end_daddr; > + > /* > * Filter out records that start before our startpoint, if the > * caller requested that. > @@ -348,6 +358,21 @@ xfs_getfsmap_helper( > return error; > fmr.fmr_offset = XFS_FSB_TO_BB(mp, rec->rm_offset); > fmr.fmr_length = len_daddr; > + /* If the start address of the record is before the low key, truncate left. */ > + if (info->start_daddr > rec_daddr) { > + trunc_len = info->start_daddr - rec_daddr; > + fmr.fmr_physical += trunc_len; > + fmr.fmr_length -= trunc_len; > + /* need to update the offset in rmapbt. */ > + if (info->missing_owner == XFS_FMR_OWN_FREE) > + fmr.fmr_offset += trunc_len; > + } > + /* If the end address of the record exceeds the high key, truncate right. */ > + if (info->end_daddr) { > + fmr.fmr_length = umin(fmr.fmr_length, info->end_daddr - fmr.fmr_physical); > + if (fmr.fmr_length == 0) > + goto out; > + } > if (rec->rm_flags & XFS_RMAP_UNWRITTEN) > fmr.fmr_flags |= FMR_OF_PREALLOC; > if (rec->rm_flags & XFS_RMAP_ATTR_FORK) > @@ -364,7 +389,7 @@ xfs_getfsmap_helper( > > xfs_getfsmap_format(mp, &fmr, info); > out: > - rec_daddr += len_daddr; > + rec_daddr = fmr.fmr_physical + fmr.fmr_length; > if (info->next_daddr < rec_daddr) > info->next_daddr = rec_daddr; > return 0; > @@ -655,6 +680,13 @@ __xfs_getfsmap_datadev( > error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); > if (error) > break; > + /* > + * Set the owner of high_key to the maximum again to > + * prevent missing intervals during the query. > + */ > + if (info->high.rm_owner == 0 && > + info->missing_owner == XFS_FMR_OWN_FREE) > + info->high.rm_owner = ULLONG_MAX; > xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); > } > > @@ -946,6 +978,9 @@ xfs_getfsmap( > > info.next_daddr = head->fmh_keys[0].fmr_physical + > head->fmh_keys[0].fmr_length; > + /* Assignment is performed only for the first time. */ > + if (head->fmh_keys[0].fmr_length == 0) > + info.start_daddr = info.next_daddr; > info.fsmap_recs = fsmap_recs; > info.head = head; > > @@ -966,8 +1001,10 @@ xfs_getfsmap( > * low key, zero out the low key so that we get > * everything from the beginning. > */ > - if (handlers[i].dev == head->fmh_keys[1].fmr_device) > + if (handlers[i].dev == head->fmh_keys[1].fmr_device) { > dkeys[1] = head->fmh_keys[1]; > + info.end_daddr = dkeys[1].fmr_physical + dkeys[1].fmr_length; > + } > if (handlers[i].dev > head->fmh_keys[0].fmr_device) > memset(&dkeys[0], 0, sizeof(struct xfs_fsmap)); > > @@ -991,6 +1028,7 @@ xfs_getfsmap( > xfs_trans_cancel(tp); > tp = NULL; > info.next_daddr = 0; > + info.start_daddr = 0; > } > > if (tp) > -- > 2.39.2 > >