Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 12, 2024 at 10:40:15AM +0000, John Garry wrote:
> On 11/12/2024 23:47, Darrick J. Wong wrote:
> > On Tue, Dec 10, 2024 at 12:57:32PM +0000, John Garry wrote:
> > > For atomic writes support, it is required to only ever submit a single bio
> > > (for an atomic write).
> > > 
> > > Furthermore, currently the atomic write unit min and max limit is fixed at
> > > the FS block size.
> > > 
> > > For lifting the atomic write unit max limit, it may occur that an atomic
> > > write spans mixed unwritten and mapped extents. For this case, due to the
> > > iterative nature of iomap, multiple bios would be produced, which is
> > > intolerable.
> > > 
> > > Add a function to zero unwritten extents in a certain range, which may be
> > > used to ensure that unwritten extents are zeroed prior to issuing of an
> > > atomic write.
> > 
> > I still dislike this.  IMO block untorn writes _is_ a niche feature for
> > programs that perform IO in large blocks.  Any program that wants a
> > general "apply all these updates or none of them" interface should use
> > XFS_IOC_EXCHANGE_RANGE since it has no awu_max restrictions, can handle
> > discontiguous update ranges, doesn't require block alignment, etc.
> 
> That is not a problem which I am trying to solve. Indeed atomic writes are
> for fixed blocks of data and not atomic file updates.

Agreed.

> However, I still think that we should be able to atomic write mixed extents,
> even though it is a pain to implement. To that end, I could be convinced
> again that we don't require it...

Well... if you /did/ add a few entries to include/uapi/linux/fs.h for
ways that an untorn write can fail, then we could define the programming
interface as so:

"If you receive -EBADMAP, then call fallocate(FALLOC_FL_MAKE_OVERWRITE)
to force all the mappings to pure overwrites."

...since there have been a few people who have asked about that ability
so that they can write+fdatasync without so much overhead from file
metadata updates.

> > 
> > Instead here we are adding a bunch of complexity, and not even all that
> > well:
> > 
> > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> > > ---
> > >   fs/iomap/direct-io.c  | 76 +++++++++++++++++++++++++++++++++++++++++++
> > >   include/linux/iomap.h |  3 ++
> > >   2 files changed, 79 insertions(+)
> > > 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 23fdad16e6a8..18c888f0c11f 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > >   }
> > >   EXPORT_SYMBOL_GPL(iomap_dio_rw);
> > > +static loff_t
> > > +iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
> > > +{
> > > +	const struct iomap *iomap = &iter->iomap;
> > > +	loff_t length = iomap_length(iter);
> > > +	loff_t pos = iter->pos;
> > > +
> > > +	if (iomap->type == IOMAP_UNWRITTEN) {
> > > +		int ret;
> > > +
> > > +		dio->flags |= IOMAP_DIO_UNWRITTEN;
> > > +		ret = iomap_dio_zero(iter, dio, pos, length);
> > 
> > Shouldn't this be detecting the particular case that the mapping for the
> > kiocb is in mixed state and only zeroing in that case?  This just
> > targets every unwritten extent, even if the unwritten extent covered the
> > entire range that is being written.
> 
> Right, so I did touch on this in the final comment in patch 4/7 commit log.
> 
> Why I did it this way? I did not think that it made much difference, since
> this zeroing would be generally a one-off and did not merit even more
> complexity to implement.

The trouble is, if you fallocate the whole file and then write an
aligned 64k block, this will write zeroes to the block, update the
mapping, and only then issue the untorn write.  Sure that's a one time
performance hit, but probably not a welcome one.

> > It doesn't handle COW, it doesn't
> 
> Do we want to atomic write COW?

I don't see why not -- if there's a single COW mapping for the whole
untorn write, then the data gets written to the media in an untorn
fashion, and the remap is a single transaction.

> > handle holes, etc.
> 
> I did test hole, and it seemed to work. However I noticed that for a hole
> region we get IOMAP_UNWRITTEN type, like:

Oh right, that's ->iomap_begin allocating an unwritten extent into the
hole, because you're not allowed to specify a hole for the destination
of a write.  I withdraw that part of the comment.

> # xfs_bmap -vvp /root/mnt/file
> /root/mnt/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..127]:        192..319          0 (192..319)         128 000000
>    1: [128..255]:      hole                                   128
>    2: [256..383]:      448..575          0 (448..575)         128 000000
>  FLAG Values:
>     0100000 Shared extent
>     0010000 Unwritten preallocated extent
>     0001000 Doesn't begin on stripe unit
>     0000100 Doesn't end   on stripe unit
>     0000010 Doesn't begin on stripe width
>     0000001 Doesn't end   on stripe width
> #
> #
> 
> For an atomic wrote of length 65536 @ offset 65536.
> 
> Any hint on how to create a iomap->type == IOMAP_HOLE?
> 
> > Also, can you make a version of blkdev_issue_zeroout that returns the
> > bio so the caller can issue them asynchronously instead of opencoding
> > the bio_alloc loop in iomap_dev_zero?
> 
> ok, fine
> 
> > 
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > > +
> > > +	dio->size += length;
> > > +
> > > +	return length;
> > > +}
> > > +
> > > +ssize_t
> > > +iomap_dio_zero_unwritten(struct kiocb *iocb, struct iov_iter *iter,
> > > +		const struct iomap_ops *ops, const struct iomap_dio_ops *dops)
> > > +{
> > > +	struct inode *inode = file_inode(iocb->ki_filp);
> > > +	struct iomap_dio *dio;
> > > +	ssize_t ret;
> > > +	struct iomap_iter iomi = {
> > > +		.inode		= inode,
> > > +		.pos		= iocb->ki_pos,
> > > +		.len		= iov_iter_count(iter),
> > > +		.flags		= IOMAP_WRITE,
> > 
> > IOMAP_WRITE | IOMAP_DIRECT, no?
> 
> yes, I'll fix that.
> 
> And I should also set IOMAP_DIO_WRITE for dio->flags.

<nod>

--D

> Thanks,
> John
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux