On 2019/12/16 17:36, Hannes Reinecke wrote: [...] >> +static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, >> + unsigned int flags, struct iomap *iomap, >> + struct iomap *srcmap) >> +{ >> + struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb); >> + struct zonefs_inode_info *zi = ZONEFS_I(inode); >> + loff_t max_isize = zi->i_max_size; >> + loff_t isize; >> + >> + /* >> + * For sequential zones, enforce direct IO writes. This is already >> + * checked when writes are issued, so warn about this here if we >> + * get buffered write to a sequential file inode. >> + */ >> + if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ && >> + (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT))) >> + return -EIO; >> + >> + /* >> + * For all zones, all blocks are always mapped. For sequential zones, >> + * all blocks after the write pointer (inode size) are always unwritten. >> + */ >> + mutex_lock(&zi->i_truncate_mutex); >> + isize = i_size_read(inode); >> + if (offset >= isize) { >> + length = min(length, max_isize - offset); >> + if (zi->i_ztype == ZONEFS_ZTYPE_CNV) >> + iomap->type = IOMAP_MAPPED; >> + else >> + iomap->type = IOMAP_UNWRITTEN; >> + } else { >> + length = min(length, isize - offset); >> + iomap->type = IOMAP_MAPPED; >> + } >> + mutex_unlock(&zi->i_truncate_mutex); >> + >> + iomap->offset = offset & (~sbi->s_blocksize_mask); >> + iomap->length = ((offset + length + sbi->s_blocksize_mask) & >> + (~sbi->s_blocksize_mask)) - iomap->offset; >> + iomap->bdev = inode->i_sb->s_bdev; >> + iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset; >> + >> + return 0; >> +} >> + >> +static const struct iomap_ops zonefs_iomap_ops = { >> + .iomap_begin = zonefs_iomap_begin, >> +}; >> + > This probably shows my complete ignorance, but what is the effect on > enforcing the direct I/O writes on the pagecache? > IE what happens for buffered reads? Will the pages be invalidated when a > write has been issued? Yes, a direct write issued to a file range that has cached pages result in these pages to be invalidated. But note that in the case of zonefs, this can happen only in the case of conventional zones. For sequential zones, this does not happen: reads can be buffered and cache pages but only for pages below the write pointer. And writes can only be issued at the write pointer. So there is never any possible overlap between buffered reads and direct writes. > Or do we simply rely on upper layers to ensure no concurrent buffered > and direct I/O is being made? Nope. VFS, or the file system specific implementation, takes care of that. See generic_file_direct_write() and its call to invalidate_inode_pages2_range(). > > [ .. ] >> + >> +static int zonefs_seq_file_truncate(struct inode *inode, loff_t isize) >> +{ >> + struct zonefs_inode_info *zi = ZONEFS_I(inode); >> + loff_t old_isize; >> + enum req_opf op; >> + int ret = 0; >> + >> + /* >> + * For sequential zone files, we can only allow truncating to 0 size, >> + * which is equivalent to a zone reset, or to the maximum file size, >> + * which is equivalent toa zone finish. > > Spelling: to a Good catch. Will fix it. Thanks. > > [ .. ] > > Other than that: > Reviewed-by: Hannes Reinecke <hare@xxxxxxx> > > Cheers, > > Hannes > -- Damien Le Moal Western Digital Research