Re: [RFCv2 0/4] ext4: bmap & fiemap conversion to use iomap

Ritesh Harjani <riteshh@xxxxxxxxxxxxx> · Thu, 6 Feb 2020 10:56:18 +0530

On 2/5/20 9:27 PM, Darrick J. Wong wrote:
On Wed, Feb 05, 2020 at 06:17:44PM +0530, Ritesh Harjani wrote:


On 1/30/20 11:04 PM, Ritesh Harjani wrote:


On 1/30/20 9:30 PM, Darrick J. Wong wrote:
On Tue, Jan 28, 2020 at 03:48:24PM +0530, Ritesh Harjani wrote:
Hello All,

Background
==========
There are RFCv2 patches to move ext4 bmap & fiemap calls to use
iomap APIs.
This reduces the users of ext4_get_block API and thus a step
towards getting
rid of buffer_heads from ext4. Also reduces a lot of code by
making use of
existing iomap_ops (except for xattr implementation).

Testing (done on ext4 master branch)
========
'xfstests -g auto' passes with default mkfs/mount configuration
(v/s which also pass with vanilla kernel without this patch). Except
generic/473 which also failes on XFS. This seems to be the test
case issue
since it expects the data in slightly different way as compared
to what iomap
returns.
Point 2.a below describes more about this.

Observations/Review required
============================
1. bmap related old v/s new method differences:-
     a. In case if addr > INT_MAX, it issues a warning and
        returns 0 as the block no. While earlier it used to return the
        truncated value with no warning.

Good...

     b. block no. is only returned in case of iomap->type is
IOMAP_MAPPED,
        but not when iomap->type is IOMAP_UNWRITTEN. While with
previously
        we used to get block no. for both of above cases.

Assuming the only remaining usecase of bmap is to tell old bootloaders
where to find vmlinuz blocks on disk, I don't see much reason to map
unwritten blocks -- there's no data there, and if your bootloader writes
to the filesystem(!) then you can't read whatever was written there
anyway.

Yes, no objection there. Just wanted to get it reviewed.



Uh, can we put this ioctl on the deprecation list, please? :)

2. Fiemap related old v/s new method differences:-
     a. iomap_fiemap returns the disk extent information in exact
        correspondence with start of user requested logical
offset till the
        length requested by user. While in previous implementation the
        returned information used to give the complete extent
information if
        the range requested by user lies in between the extent mapping.

This is a topic of much disagreement.  The FIEMAP documentation says
that the call must return data for the requested range, but *may* return
a mapping that extends beyond the requested range.

XFS (and now iomap) only return data for the requested range, whereas
ext4 has (had?) the behavior you describe.  generic/473 was an attempt
to enforce the ext4 behavior across all filesystems, but I put it in my
dead list and never run it.

So it's a behavioral change, but the new behavior isn't forbidden.

Sure, thanks.


     b. iomap_fiemap adds the FIEMAP_EXTENT_LAST flag also at the last
        fiemap_extent mapping range requested by the user via fm_length (
        if that has a valid mapped extent on the disk).

That sounds like a bug.  _LAST is supposed to be set on the last extent
in the file, not the last record in the queried dataset.

Thought so too, sure will spend some time to try fixing it.

Looked into this. I think below should fix our above reported problem with
current iomap code.
If no objection I will send send PATCHv3 with below fix as the first
patch in the series.

diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index bccf305ea9ce..ee53991810d5 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -100,7 +100,12 @@ int iomap_fiemap(struct inode *inode, struct
fiemap_extent_info *fi,
         }

         if (ctx.prev.type != IOMAP_HOLE) {
-               ret = iomap_to_fiemap(fi, &ctx.prev, FIEMAP_EXTENT_LAST);
+               u32 flags = 0;
+               loff_t isize = i_size_read(inode);
+
+               if (ctx.prev.offset + ctx.prev.length >= isize)

What happens if ctx.prev actually is the last iomap in the file, but
isize extends beyond that?  e.g.,

# xfs_io -f -c 'pwrite 0 64k' /a
# truncate -s 100m /a
# filefrag -v /a

Err.. should never miss this truncate case.

Digging further, I see even generic_block_fiemap() does not take care of
this case if the file isize is extended by truncate.
It happens to mark _LAST only based on i_size_read(). It seems only ext*
family and hpfs filesystem seems to be using this currently.
And gfs2 was the user of this api till sometime back before it finally
moved to use iomap_fiemap() api.



I think we need the fiemap variant of the iomap_begin functions to pass
a flag in the iomap that the fiemap implementation can pick up.

Sure, let me do some digging on this one. One challenge which I think 
would be for filesystems to tell *efficiently* on whether this is the
last extent or not (without checking on every iomap_begin call about,
if there exist a next extent on the disk by doing more I/O - that's why
*efficiently*).

If done then -
we could use IOMAP_FIEMAP as the flag to pass to iomap_begin functions
and it could return us the iomap->type marked with IOMAP_EXTENT_LAST 
which could represent that this is actually the last extent on disk for
this inode.


-ritesh


--D

+                       flags |= FIEMAP_EXTENT_LAST;
+               ret = iomap_to_fiemap(fi, &ctx.prev, flags);
                 if (ret < 0)
                         return ret;
         }


-ritesh





But if the user
        requested for more fm_length which could not be mapped in
the last
        fiemap_extent, then the flag is not set.

Yes... if there were more extents to map than there was space in the map
array, then _LAST should remain unset to encourage userspace to come
back for the rest of the mappings.

(Unless maybe I'm misunderstanding here...)

e.g. output for above differences 2.a & 2.b
===========================================
create a file with below cmds.
1. fallocate -o 0 -l 8K testfile.txt
2. xfs_io -c "pwrite 8K 8K" testfile.txt
3. check extent mapping:- xfs_io -c "fiemap -v" testfile.txt
4. run this binary on with and without these patches:- ./a.out
(test_fiemap_diff.c) [4]

o/p of xfs_io -c "fiemap -v"
============================================
With this patch on patched kernel:-
testfile.txt:
   EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
     0: [0..15]:         122802736..122802751    16 0x800
     1: [16..31]:        122687536..122687551    16   0x1

without patch on vanilla kernel (no difference):-
testfile.txt:
   EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
     0: [0..15]:         332211376..332211391    16 0x800
     1: [16..31]:        332722392..332722407    16   0x1


o/p of a.out without patch:-
================
riteshh-> ./a.out
logical: [       0..      15] phys: 332211376..332211391 flags:
0x800 tot: 16
(0) extent flag = 2048

o/p of a.out with patch (both point 2.a & 2.b could be seen)
=======================
riteshh-> ./a.out
logical: [       0..       7] phys: 122802736..122802743 flags:
0x801 tot: 8
(0) extent flag = 2049

FYI - In test_fiemap_diff.c test we had
a. fm_extent_count = 1
b. fm_start = 0
c. fm_length = 4K
Whereas when we change fm_extent_count = 32, then we don't see
any difference.

e.g. output for above difference listed in point 1.b
====================================================

o/p without patch (block no returned for unwritten block as well)
=========Testing IOCTL FIBMAP=========
File size = 16384, blkcnt = 4, blocksize = 4096
    0   41526422
    1   41526423
    2   41590299
    3   41590300

o/p with patch (0 returned for unwritten block)
=========Testing IOCTL FIBMAP=========
File size = 16384, blkcnt = 4, blocksize = 4096
    0          0          0
    1          0          0
    2   15335942      29953
    3   15335943      29953


Summary:-
========
Due to some of the observational differences to user, listed above,
requesting to please help with a careful review in moving this to iomap.
Digging into some older threads, it looks like these differences
should be fine,
since the same tools have been working fine with XFS (which uses
iomap based
implementation) [1]
Also as Ted suggested in [3]: Fiemap & bmap spec could be made
based on the ext4
implementation. But since all the tools also work with xfs which
uses iomap
based fiemap, so we should be good there.

<nod> Thanks for the worked example output. :)

Thanks for the review. :)

ritesh



--D


References of some previous discussions:
=======================================
[1]: https://www.spinics.net/lists/linux-fsdevel/msg128370.html
[2]: https://www.spinics.net/lists/linux-fsdevel/msg127675.html
[3]: https://www.spinics.net/lists/linux-fsdevel/msg128368.html
[4]: https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/tools/test_fiemap_diff.c

[RFCv1]: https://www.spinics.net/lists/linux-ext4/msg67077.html


Ritesh Harjani (4):
    ext4: Add IOMAP_F_MERGED for non-extent based mapping
    ext4: Optimize ext4_ext_precache for 0 depth
    ext4: Move ext4 bmap to use iomap infrastructure.
    ext4: Move ext4_fiemap to use iomap infrastructure

   fs/ext4/extents.c | 288 +++++++---------------------------------------
   fs/ext4/inline.c  |  41 -------
   fs/ext4/inode.c   |   6 +-
   3 files changed, 49 insertions(+), 286 deletions(-)

--
2.21.0