Re: [RFCv2 0/4] ext4: bmap & fiemap conversion to use iomap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 05, 2020 at 06:17:44PM +0530, Ritesh Harjani wrote:
> 
> 
> On 1/30/20 11:04 PM, Ritesh Harjani wrote:
> > 
> > 
> > On 1/30/20 9:30 PM, Darrick J. Wong wrote:
> > > On Tue, Jan 28, 2020 at 03:48:24PM +0530, Ritesh Harjani wrote:
> > > > Hello All,
> > > > 
> > > > Background
> > > > ==========
> > > > There are RFCv2 patches to move ext4 bmap & fiemap calls to use
> > > > iomap APIs.
> > > > This reduces the users of ext4_get_block API and thus a step
> > > > towards getting
> > > > rid of buffer_heads from ext4. Also reduces a lot of code by
> > > > making use of
> > > > existing iomap_ops (except for xattr implementation).
> > > > 
> > > > Testing (done on ext4 master branch)
> > > > ========
> > > > 'xfstests -g auto' passes with default mkfs/mount configuration
> > > > (v/s which also pass with vanilla kernel without this patch). Except
> > > > generic/473 which also failes on XFS. This seems to be the test
> > > > case issue
> > > > since it expects the data in slightly different way as compared
> > > > to what iomap
> > > > returns.
> > > > Point 2.a below describes more about this.
> > > > 
> > > > Observations/Review required
> > > > ============================
> > > > 1. bmap related old v/s new method differences:-
> > > >     a. In case if addr > INT_MAX, it issues a warning and
> > > >        returns 0 as the block no. While earlier it used to return the
> > > >        truncated value with no warning.
> > > 
> > > Good...
> > > 
> > > >     b. block no. is only returned in case of iomap->type is
> > > > IOMAP_MAPPED,
> > > >        but not when iomap->type is IOMAP_UNWRITTEN. While with
> > > > previously
> > > >        we used to get block no. for both of above cases.
> > > 
> > > Assuming the only remaining usecase of bmap is to tell old bootloaders
> > > where to find vmlinuz blocks on disk, I don't see much reason to map
> > > unwritten blocks -- there's no data there, and if your bootloader writes
> > > to the filesystem(!) then you can't read whatever was written there
> > > anyway.
> > 
> > Yes, no objection there. Just wanted to get it reviewed.
> > 
> > 
> > > 
> > > Uh, can we put this ioctl on the deprecation list, please? :)
> > > 
> > > > 2. Fiemap related old v/s new method differences:-
> > > >     a. iomap_fiemap returns the disk extent information in exact
> > > >        correspondence with start of user requested logical
> > > > offset till the
> > > >        length requested by user. While in previous implementation the
> > > >        returned information used to give the complete extent
> > > > information if
> > > >        the range requested by user lies in between the extent mapping.
> > > 
> > > This is a topic of much disagreement.  The FIEMAP documentation says
> > > that the call must return data for the requested range, but *may* return
> > > a mapping that extends beyond the requested range.
> > > 
> > > XFS (and now iomap) only return data for the requested range, whereas
> > > ext4 has (had?) the behavior you describe.  generic/473 was an attempt
> > > to enforce the ext4 behavior across all filesystems, but I put it in my
> > > dead list and never run it.
> > > 
> > > So it's a behavioral change, but the new behavior isn't forbidden.
> > 
> > Sure, thanks.
> > 
> > > 
> > > >     b. iomap_fiemap adds the FIEMAP_EXTENT_LAST flag also at the last
> > > >        fiemap_extent mapping range requested by the user via fm_length (
> > > >        if that has a valid mapped extent on the disk).
> > > 
> > > That sounds like a bug.  _LAST is supposed to be set on the last extent
> > > in the file, not the last record in the queried dataset.
> > 
> > Thought so too, sure will spend some time to try fixing it.
> 
> Looked into this. I think below should fix our above reported problem with
> current iomap code.
> If no objection I will send send PATCHv3 with below fix as the first
> patch in the series.
> 
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index bccf305ea9ce..ee53991810d5 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -100,7 +100,12 @@ int iomap_fiemap(struct inode *inode, struct
> fiemap_extent_info *fi,
>         }
> 
>         if (ctx.prev.type != IOMAP_HOLE) {
> -               ret = iomap_to_fiemap(fi, &ctx.prev, FIEMAP_EXTENT_LAST);
> +               u32 flags = 0;
> +               loff_t isize = i_size_read(inode);
> +
> +               if (ctx.prev.offset + ctx.prev.length >= isize)

What happens if ctx.prev actually is the last iomap in the file, but
isize extends beyond that?  e.g.,

# xfs_io -f -c 'pwrite 0 64k' /a
# truncate -s 100m /a
# filefrag -v /a

I think we need the fiemap variant of the iomap_begin functions to pass
a flag in the iomap that the fiemap implementation can pick up.

--D

> +                       flags |= FIEMAP_EXTENT_LAST;
> +               ret = iomap_to_fiemap(fi, &ctx.prev, flags);
>                 if (ret < 0)
>                         return ret;
>         }
> 
> 
> -ritesh
> 
> 
> > 
> > 
> > > 
> > > > But if the user
> > > >        requested for more fm_length which could not be mapped in
> > > > the last
> > > >        fiemap_extent, then the flag is not set.
> > > 
> > > Yes... if there were more extents to map than there was space in the map
> > > array, then _LAST should remain unset to encourage userspace to come
> > > back for the rest of the mappings.
> > > 
> > > (Unless maybe I'm misunderstanding here...)
> > > 
> > > > e.g. output for above differences 2.a & 2.b
> > > > ===========================================
> > > > create a file with below cmds.
> > > > 1. fallocate -o 0 -l 8K testfile.txt
> > > > 2. xfs_io -c "pwrite 8K 8K" testfile.txt
> > > > 3. check extent mapping:- xfs_io -c "fiemap -v" testfile.txt
> > > > 4. run this binary on with and without these patches:- ./a.out
> > > > (test_fiemap_diff.c) [4]
> > > > 
> > > > o/p of xfs_io -c "fiemap -v"
> > > > ============================================
> > > > With this patch on patched kernel:-
> > > > testfile.txt:
> > > >   EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
> > > >     0: [0..15]:         122802736..122802751    16 0x800
> > > >     1: [16..31]:        122687536..122687551    16   0x1
> > > > 
> > > > without patch on vanilla kernel (no difference):-
> > > > testfile.txt:
> > > >   EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
> > > >     0: [0..15]:         332211376..332211391    16 0x800
> > > >     1: [16..31]:        332722392..332722407    16   0x1
> > > > 
> > > > 
> > > > o/p of a.out without patch:-
> > > > ================
> > > > riteshh-> ./a.out
> > > > logical: [       0..      15] phys: 332211376..332211391 flags:
> > > > 0x800 tot: 16
> > > > (0) extent flag = 2048
> > > > 
> > > > o/p of a.out with patch (both point 2.a & 2.b could be seen)
> > > > =======================
> > > > riteshh-> ./a.out
> > > > logical: [       0..       7] phys: 122802736..122802743 flags:
> > > > 0x801 tot: 8
> > > > (0) extent flag = 2049
> > > > 
> > > > FYI - In test_fiemap_diff.c test we had
> > > > a. fm_extent_count = 1
> > > > b. fm_start = 0
> > > > c. fm_length = 4K
> > > > Whereas when we change fm_extent_count = 32, then we don't see
> > > > any difference.
> > > > 
> > > > e.g. output for above difference listed in point 1.b
> > > > ====================================================
> > > > 
> > > > o/p without patch (block no returned for unwritten block as well)
> > > > =========Testing IOCTL FIBMAP=========
> > > > File size = 16384, blkcnt = 4, blocksize = 4096
> > > >    0   41526422
> > > >    1   41526423
> > > >    2   41590299
> > > >    3   41590300
> > > > 
> > > > o/p with patch (0 returned for unwritten block)
> > > > =========Testing IOCTL FIBMAP=========
> > > > File size = 16384, blkcnt = 4, blocksize = 4096
> > > >    0          0          0
> > > >    1          0          0
> > > >    2   15335942      29953
> > > >    3   15335943      29953
> > > > 
> > > > 
> > > > Summary:-
> > > > ========
> > > > Due to some of the observational differences to user, listed above,
> > > > requesting to please help with a careful review in moving this to iomap.
> > > > Digging into some older threads, it looks like these differences
> > > > should be fine,
> > > > since the same tools have been working fine with XFS (which uses
> > > > iomap based
> > > > implementation) [1]
> > > > Also as Ted suggested in [3]: Fiemap & bmap spec could be made
> > > > based on the ext4
> > > > implementation. But since all the tools also work with xfs which
> > > > uses iomap
> > > > based fiemap, so we should be good there.
> > > 
> > > <nod> Thanks for the worked example output. :)
> > 
> > Thanks for the review. :)
> > 
> > ritesh
> > 
> > 
> > > 
> > > --D
> > > 
> > > > 
> > > > References of some previous discussions:
> > > > =======================================
> > > > [1]: https://www.spinics.net/lists/linux-fsdevel/msg128370.html
> > > > [2]: https://www.spinics.net/lists/linux-fsdevel/msg127675.html
> > > > [3]: https://www.spinics.net/lists/linux-fsdevel/msg128368.html
> > > > [4]: https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/tools/test_fiemap_diff.c
> > > > 
> > > > [RFCv1]: https://www.spinics.net/lists/linux-ext4/msg67077.html
> > > > 
> > > > 
> > > > Ritesh Harjani (4):
> > > >    ext4: Add IOMAP_F_MERGED for non-extent based mapping
> > > >    ext4: Optimize ext4_ext_precache for 0 depth
> > > >    ext4: Move ext4 bmap to use iomap infrastructure.
> > > >    ext4: Move ext4_fiemap to use iomap infrastructure
> > > > 
> > > >   fs/ext4/extents.c | 288 +++++++---------------------------------------
> > > >   fs/ext4/inline.c  |  41 -------
> > > >   fs/ext4/inode.c   |   6 +-
> > > >   3 files changed, 49 insertions(+), 286 deletions(-)
> > > > 
> > > > -- 
> > > > 2.21.0
> > > > 
> 



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux