On 2020/07/20 17:52, Johannes Thumshirn wrote: > In the zoned storage model, the sectors within a zone are typically all > writeable. With the introduction of the Zoned Namespace (ZNS) Command > Set in the NVM Express organization, the model was extended to have a > specific writeable capacity. > > This zone capacity can be less than the overall zone size for a NVMe ZNS > device. For other zoned block devices like ZBC or null_blk in zoned-mode > the zone capacity is always equal to the zone size. null_blk has the zone_capacity option now to emulate ZNS smaller zone capacities. But that option applies to sequential zones only. null_blk conventional zones always have a capacity equal to zone size. Is it what you meant to say here ? > > Use the zone capacity field instead from blk_zone for determining the > maximum inode size and inode blocks in zonefs. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> > --- > fs/zonefs/super.c | 11 +++++++---- > fs/zonefs/zonefs.h | 3 +++ > 2 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c > index b13c332a3513..337249f98cae 100644 > --- a/fs/zonefs/super.c > +++ b/fs/zonefs/super.c > @@ -335,7 +335,7 @@ static void zonefs_io_error(struct inode *inode, bool write) > struct zonefs_sb_info *sbi = ZONEFS_SB(sb); > unsigned int noio_flag; > unsigned int nr_zones = > - zi->i_max_size >> (sbi->s_zone_sectors_shift + SECTOR_SHIFT); > + zi->i_zone_size >> (sbi->s_zone_sectors_shift + SECTOR_SHIFT); > struct zonefs_ioerr_data err = { > .inode = inode, > .write = write, > @@ -398,7 +398,7 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize) > goto unlock; > > ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector, > - zi->i_max_size >> SECTOR_SHIFT, GFP_NOFS); > + zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS); > if (ret) { > zonefs_err(inode->i_sb, > "Zone management operation at %llu failed %d", > @@ -1051,14 +1051,16 @@ static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone, > > zi->i_ztype = type; > zi->i_zsector = zone->start; > + zi->i_zone_size = zone->len << SECTOR_SHIFT; > + > zi->i_max_size = min_t(loff_t, MAX_LFS_FILESIZE, > - zone->len << SECTOR_SHIFT); > + zone->capacity << SECTOR_SHIFT); > zi->i_wpoffset = zonefs_check_zone_condition(inode, zone, true, true); > > inode->i_uid = sbi->s_uid; > inode->i_gid = sbi->s_gid; > inode->i_size = zi->i_wpoffset; > - inode->i_blocks = zone->len; > + inode->i_blocks = zi->i_max_size >> SECTOR_SHIFT; > > inode->i_op = &zonefs_file_inode_operations; > inode->i_fop = &zonefs_file_operations; > @@ -1169,6 +1171,7 @@ static int zonefs_create_zgroup(struct zonefs_zone_data *zd, > else if (next->cond == BLK_ZONE_COND_OFFLINE) > zone->cond = BLK_ZONE_COND_OFFLINE; > } > + zone->capacity = zone->len; > } Normally, conventional zones on all known zoned devices will always have a zone capacity equal to the zone size. But I would rather check that this is the case here as the AGGRCNV option can only work if zone capacity is equal to the zone size. So something like: diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index abfb17f88f9a..db4853c7ec75 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -1164,12 +1164,17 @@ static int zonefs_create_zgroup(struct zonefs_zone_data *zd, if (zonefs_zone_type(next) != type) break; zone->len += next->len; + zone->capacity += next->capacity; if (next->cond == BLK_ZONE_COND_READONLY && zone->cond != BLK_ZONE_COND_OFFLINE) zone->cond = BLK_ZONE_COND_READONLY; else if (next->cond == BLK_ZONE_COND_OFFLINE) zone->cond = BLK_ZONE_COND_OFFLINE; } + if (zone->capacity != zone->len) { + zonefs_err(sb, "Invalid conventional zone capacity\n"); + ret = -EINVAL; + } } would be better. > > /* > diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h > index ad17fef7ce91..55b39970acb2 100644 > --- a/fs/zonefs/zonefs.h > +++ b/fs/zonefs/zonefs.h > @@ -56,6 +56,9 @@ struct zonefs_inode_info { > /* File maximum size */ > loff_t i_max_size; > > + /* File zone size */ > + loff_t i_zone_size; > + > /* > * To serialise fully against both syscall and mmap based IO and > * sequential file truncation, two locks are used. For serializing > -- Damien Le Moal Western Digital Research