Re: [PATCH v2 2/7] iomap: Add zero unwritten mappings dio support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




However, I still think that we should be able to atomic write mixed extents,
even though it is a pain to implement. To that end, I could be convinced
again that we don't require it...

Well... if you /did/ add a few entries to include/uapi/linux/fs.h for
ways that an untorn write can fail, then we could define the programming
interface as so:

"If you receive -EBADMAP, then call fallocate(FALLOC_FL_MAKE_OVERWRITE)
to force all the mappings to pure overwrites."

...since there have been a few people who have asked about that ability
so that they can write+fdatasync without so much overhead from file
metadata updates.

ok, I see.

All this does seem more complicated in terms of implementation and user experience than what I have in this series. But if you think that there is value in FALLOC_FL_MAKE_OVERWRITE for other scenarios, then maybe it could be good, though.



Instead here we are adding a bunch of complexity, and not even all that
well:

Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
   fs/iomap/direct-io.c  | 76 +++++++++++++++++++++++++++++++++++++++++++
   include/linux/iomap.h |  3 ++
   2 files changed, 79 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 23fdad16e6a8..18c888f0c11f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -805,6 +805,82 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
   }
   EXPORT_SYMBOL_GPL(iomap_dio_rw);
+static loff_t
+iomap_dio_zero_unwritten_iter(struct iomap_iter *iter, struct iomap_dio *dio)
+{
+	const struct iomap *iomap = &iter->iomap;
+	loff_t length = iomap_length(iter);
+	loff_t pos = iter->pos;
+
+	if (iomap->type == IOMAP_UNWRITTEN) {
+		int ret;
+
+		dio->flags |= IOMAP_DIO_UNWRITTEN;
+		ret = iomap_dio_zero(iter, dio, pos, length);

Shouldn't this be detecting the particular case that the mapping for the
kiocb is in mixed state and only zeroing in that case?  This just
targets every unwritten extent, even if the unwritten extent covered the
entire range that is being written.

Right, so I did touch on this in the final comment in patch 4/7 commit log.

Why I did it this way? I did not think that it made much difference, since
this zeroing would be generally a one-off and did not merit even more
complexity to implement.

The trouble is, if you fallocate the whole file and then write an
aligned 64k block, this will write zeroes to the block, update the
mapping, and only then issue the untorn write.  Sure that's a one time
performance hit, but probably not a welcome one.

ok, I can try to improve on this. It might get considerably more complicated...


It doesn't handle COW, it doesn't

Do we want to atomic write COW?

I don't see why not -- if there's a single COW mapping for the whole
untorn write, then the data gets written to the media in an untorn
fashion, and the remap is a single transaction.

I tested atomic write on COW and it works ok, but the behavior is odd to me.

If I attempt to atomic write a single block in shared extent, then we have this callchain: xfs_file_dio_write_atomic() -> iomap_dio_rw(IOMAP_DIO_OVERWRITE_ONLY) -> ... xfs_direct_write_iomap_begin() -> xfs_reflink_allocate_cow() and we alloc a new extent.

And so xfs_file_dio_write_atomic() -> iomap_dio_rw(IOMAP_DIO_OVERWRITE_ONLY) does not return -EAGAIN and we don't even attempt to zero.

I just wonder why IOMAP_DIO_OVERWRITE_ONLY is not honoured here, as xfs_reflink_allocate_cow() -> xfs_reflink_fill_cow_hole() -> xfs_bmap_write() can alloc new blocks.

I am not too concerned about atomic writing mixed extents which includes COW extents, as atomic writing mixed extents is based on "big alloc" and that does not enable reflink (yet).


handle holes, etc.

I did test hole, and it seemed to work. However I noticed that for a hole
region we get IOMAP_UNWRITTEN type, like:

Oh right, that's ->iomap_begin allocating an unwritten extent into the
hole, because you're not allowed to specify a hole for the destination
of a write.  I withdraw that part of the comment.




Thanks,
John




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux