Re: [PATCH 1/2] fs: New zonefs file system

Hannes Reinecke <hare@xxxxxxx> · Tue, 17 Dec 2019 08:28:33 +0100

On 12/17/19 1:20 AM, Damien Le Moal wrote:
On 2019/12/16 17:36, Hannes Reinecke wrote:
[...]
+static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+			      unsigned int flags, struct iomap *iomap,
+			      struct iomap *srcmap)
+{
+	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	loff_t max_isize = zi->i_max_size;
+	loff_t isize;
+
+	/*
+	 * For sequential zones, enforce direct IO writes. This is already
+	 * checked when writes are issued, so warn about this here if we
+	 * get buffered write to a sequential file inode.
+	 */
+	if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
+			 (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT)))
+		return -EIO;
+
+	/*
+	 * For all zones, all blocks are always mapped. For sequential zones,
+	 * all blocks after the write pointer (inode size) are always unwritten.
+	 */
+	mutex_lock(&zi->i_truncate_mutex);
+	isize = i_size_read(inode);
+	if (offset >= isize) {
+		length = min(length, max_isize - offset);
+		if (zi->i_ztype == ZONEFS_ZTYPE_CNV)
+			iomap->type = IOMAP_MAPPED;
+		else
+			iomap->type = IOMAP_UNWRITTEN;
+	} else {
+		length = min(length, isize - offset);
+		iomap->type = IOMAP_MAPPED;
+	}
+	mutex_unlock(&zi->i_truncate_mutex);
+
+	iomap->offset = offset & (~sbi->s_blocksize_mask);
+	iomap->length = ((offset + length + sbi->s_blocksize_mask) &
+			 (~sbi->s_blocksize_mask)) - iomap->offset;
+	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset;
+
+	return 0;
+}
+
+static const struct iomap_ops zonefs_iomap_ops = {
+	.iomap_begin	= zonefs_iomap_begin,
+};
+
This probably shows my complete ignorance, but what is the effect on
enforcing the direct I/O writes on the pagecache?
IE what happens for buffered reads? Will the pages be invalidated when a
write has been issued?

Yes, a direct write issued to a file range that has cached pages result
in these pages to be invalidated. But note that in the case of zonefs,
this can happen only in the case of conventional zones. For sequential
zones, this does not happen: reads can be buffered and cache pages but
only for pages below the write pointer. And writes can only be issued at
the write pointer. So there is never any possible overlap between
buffered reads and direct writes.

Oh, indeed, you are correct. That's indeed easy then.

Or do we simply rely on upper layers to ensure no concurrent buffered
and direct I/O is being made?

Nope. VFS, or the file system specific implementation, takes care of
that. See generic_file_direct_write() and its call to
invalidate_inode_pages2_range().

Of course.
One could even say: not applicable, as it won't happen.

Cheers,

Hannes
--
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@xxxxxxx                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer