On 8/21/19 9:03 AM, Damien Le Moal wrote: > zonefs is a very simple file system exposing each zone of a zoned > block device as a file. zonefs is in fact closer to a raw block device > access interface than to a full feature POSIX file system. > > The goal of zonefs is to simplify implementation of zoned block device > raw access by applications by allowing switching to the well known POSIX > file API rather than relying on direct block device file ioctls and > read/write. Zonefs, for instance, greatly simplifies the implementation > of LSM (log-structured merge) tree structures (such as used in RocksDB > and LevelDB) on zoned block devices by allowing SSTables to be stored in > a zone file similarly to a regular file system architecture, hence > reducing the amount of change needed in the application. > > Zonefs on-disk metadata is reduced to a super block to store a magic > number, a uuid and optional features flags and values. On mount, zonefs > uses blkdev_report_zones() to obtain the device zone configuration and > populates the mount point with a static file tree solely based on this > information. E.g. file sizes come from zone write pointer offset managed > by the device itself. > > The zone files created on mount have the following characteristics. > 1) Files representing zones of the same type are grouped together > under a common directory: > * For conventional zones, the directory "cnv" is used. > * For sequential write zones, the directory "seq" is used. > These two directories are the only directories that exist in zonefs. > Users cannot create other directories and cannot rename nor delete > the "cnv" and "seq" directories. > 2) The name of zone files is by default the number of the file within > the zone type directory, in order of increasing zone start sector. > 3) The size of conventional zone files is fixed to the device zone size. > Conventional zone files cannot be truncated. > 4) The size of sequential zone files represent the file zone write > pointer position relative to the zone start sector. Truncating these > files is allowed only down to 0, in wich case, the zone is reset to > rewind the file zone write pointer position to the start of the zone. > 5) All read and write operations to files are not allowed beyond the > file zone size. Any access exceeding the zone size is failed with > the -EFBIG error. > 6) Creating, deleting, renaming or modifying any attribute of files > and directories is not allowed. The only exception being the file > size of sequential zone files which can be modified by write > operations or truncation to 0. > > Several optional features of zonefs can be enabled at format time. > * Conventional zone aggregation: contiguous conventional zones can be > agregated into a single larger file instead of multiple per-zone > files. > * File naming: the default file number file name can be switched to > using the base-10 value of the file zone start sector. > * File ownership: The owner UID and GID of zone files is by default 0 > (root) but can be changed to any valid UID/GID. > * File access permissions: the default 640 access permissions can be > changed. > > The mkzonefs tool is used to format zonefs. This tool is available > on Github at: git@xxxxxxxxxx:damien-lemoal/zonefs-tools.git. > zonefs-tools includes a simple test suite which can be run against any > zoned block device, including null_blk block device created with zoned > mode. > > Example: the following formats a host-managed SMR HDD with the > conventional zone aggregation feature enabled. > > mkzonefs -o aggr_cnv /dev/sdX > mount -t zonefs /dev/sdX /mnt > ls -l /mnt/ > total 0 > dr-xr-xr-x 2 root root 0 Apr 11 13:00 cnv > dr-xr-xr-x 2 root root 0 Apr 11 13:00 seq > > ls -l /mnt/cnv > total 137363456 > -rw-rw---- 1 root root 140660178944 Apr 11 13:00 0 > > ls -Fal -v /mnt/seq > total 14511243264 > dr-xr-xr-x 2 root root 15942528 Jul 10 11:53 ./ > drwxr-xr-x 4 root root 1152 Jul 10 11:53 ../ > -rw-r----- 1 root root 0 Jul 10 11:53 0 > -rw-r----- 1 root root 33554432 Jul 10 13:43 1 > -rw-r----- 1 root root 0 Jul 10 11:53 2 > -rw-r----- 1 root root 0 Jul 10 11:53 3 > ... > > The aggregated conventional zone file can be used as a regular file. > Operations such as the following work. > > mkfs.ext4 /mnt/cnv/0 > mount -o loop /mnt/cnv/0 /data > > Contains contributions from Johannes Thumshirn <jthumshirn@xxxxxxx> > and Christoph Hellwig <hch@xxxxxx>. > > Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> > --- > Changes from v2: > * Addressed comments from Darrick: Typo, added checksum to super block, > enhance cheks of the super block fields validity (used reserved bytes > and unknown features bits) > * Rebased on XFS tree iomap-for-next branch > > Changes from v1: > * Rebased on latest iomap branch iomap-5.4-merge of XFS tree at > git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git > * Addressed all comments from Dave Chinner and others > > MAINTAINERS | 10 + > fs/Kconfig | 2 + > fs/Makefile | 1 + > fs/zonefs/Kconfig | 9 + > fs/zonefs/Makefile | 4 + > fs/zonefs/super.c | 1083 ++++++++++++++++++++++++++++++++++++ > fs/zonefs/zonefs.h | 177 ++++++ > include/uapi/linux/magic.h | 1 + > 8 files changed, 1287 insertions(+) > create mode 100644 fs/zonefs/Kconfig > create mode 100644 fs/zonefs/Makefile > create mode 100644 fs/zonefs/super.c > create mode 100644 fs/zonefs/zonefs.h > [ .. ] > @@ -261,6 +262,7 @@ source "fs/romfs/Kconfig" > source "fs/pstore/Kconfig" > source "fs/sysv/Kconfig" > source "fs/ufs/Kconfig" > +source "fs/ufs/Kconfig" > > endif # MISC_FILESYSTEMS > Hmm? Duplicate line? > diff --git a/fs/Makefile b/fs/Makefile > index d60089fd689b..7d3c90e1ad79 100644 > --- a/fs/Makefile > +++ b/fs/Makefile > @@ -130,3 +130,4 @@ obj-$(CONFIG_F2FS_FS) += f2fs/ > obj-$(CONFIG_CEPH_FS) += ceph/ > obj-$(CONFIG_PSTORE) += pstore/ > obj-$(CONFIG_EFIVAR_FS) += efivarfs/ > +obj-$(CONFIG_ZONEFS_FS) += zonefs/ > diff --git a/fs/zonefs/Kconfig b/fs/zonefs/Kconfig > new file mode 100644 > index 000000000000..6490547e9763 > --- /dev/null > +++ b/fs/zonefs/Kconfig > @@ -0,0 +1,9 @@ > +config ZONEFS_FS > + tristate "zonefs filesystem support" > + depends on BLOCK > + depends on BLK_DEV_ZONED > + help > + zonefs is a simple File System which exposes zones of a zoned block > + device as files. > + > + If unsure, say N. > diff --git a/fs/zonefs/Makefile b/fs/zonefs/Makefile > new file mode 100644 > index 000000000000..75a380aa1ae1 > --- /dev/null > +++ b/fs/zonefs/Makefile > @@ -0,0 +1,4 @@ > +# SPDX-License-Identifier: GPL-2.0 > +obj-$(CONFIG_ZONEFS_FS) += zonefs.o > + > +zonefs-y := super.o > diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c > new file mode 100644 > index 000000000000..5521c21fd34b > --- /dev/null > +++ b/fs/zonefs/super.c [ .. ] That whole thing looks good to me (with my limited fs skills :-), however, some things I'd like to have clarified: - zone state handling: While you do have some handling for offline zones, I'm missing a handling during normal I/O. Surely a zone can go offline via other means (like the admin calling nasty user-space programs), which then would result in an I/O error in the filesystem. Shouldn't we handle this case when doing error handling? IE shouldn't we look at the zone state when doing a REPORT ZONES, and update it if required? Similarly: How do we present zones which are not accessible? Will they still show up in the directory? I think they should, but we should be returning an error to userspace like EPERM or somesuch. - zone sizes: >From what I've seen sequential zones can be appended to, ie they'll start off at 0 and will increase in size. Conventional zones, OTOH, apparently always have a fixed size. Is that correct? Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 247165 (AG München), GF: Felix Imendörffer