On 2019/12/16 17:38, Hannes Reinecke wrote: > On 12/12/19 7:38 PM, Damien Le Moal wrote: >> Add the new file Documentation/filesystems/zonefs.txt to document zonefs >> principles and user-space tool usage. >> >> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> >> --- >> Documentation/filesystems/zonefs.txt | 150 +++++++++++++++++++++++++++ >> MAINTAINERS | 1 + >> 2 files changed, 151 insertions(+) >> create mode 100644 Documentation/filesystems/zonefs.txt >> >> diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt >> new file mode 100644 >> index 000000000000..e5d798f4087d >> --- /dev/null >> +++ b/Documentation/filesystems/zonefs.txt >> @@ -0,0 +1,150 @@ >> +ZoneFS - Zone filesystem for Zoned block devices >> + >> +Overview >> +======== >> + >> +zonefs is a very simple file system exposing each zone of a zoned block device >> +as a file. Unlike a regular file system with zoned block device support (e.g. >> +f2fs), zonefs does not hide the sequential write constraint of zoned block >> +devices to the user. Files representing sequential write zones of the device >> +must be written sequentially starting from the end of the file (append only >> +writes). >> + >> +As such, zonefs is in essence closer to a raw block device access interface >> +than to a full featured POSIX file system. The goal of zonefs is to simplify >> +the implementation of zoned block devices support in applications by replacing >> +raw block device file accesses with a richer file API, avoiding relying on >> +direct block device file ioctls which may be more obscure to developers. One >> +example of this approach is the implementation of LSM (log-structured merge) >> +tree structures (such as used in RocksDB and LevelDB) on zoned block devices by >> +allowing SSTables to be stored in a zone file similarly to a regular file system >> +rather than as a range of sectors of the entire disk. The introduction of the >> +higher level construct "one file is one zone" can help reducing the amount of >> +changes needed in the application as well as introducing support for different >> +application programming languages. >> + >> +zonefs on-disk metadata is reduced to a super block which persistently stores a >> +magic number and optional features flags and values. On mount, zonefs uses >> +blkdev_report_zones() to obtain the device zone configuration and populates >> +the mount point with a static file tree solely based on this information. >> +E.g. file sizes come from the device zone type and write pointer offset managed >> +by the device itself. >> + >> +The zone files created on mount have the following characteristics. >> +1) Files representing zones of the same type are grouped together >> + under the same sub-directory: >> + * For conventional zones, the sub-directory "cnv" is used. >> + * For sequential write zones, the sub-directory "seq" is used. >> + These two directories are the only directories that exist in zonefs. Users >> + cannot create other directories and cannot rename nor delete the "cnv" and >> + "seq" sub-directories. >> +2) The name of zone files is the number of the file within the zone type >> + sub-directory, in order of increasing zone start sector. >> +3) The size of conventional zone files is fixed to the device zone size. >> + Conventional zone files cannot be truncated. >> +4) The size of sequential zone files represent the file's zone write pointer >> + position relative to the zone start sector. Truncating these files is >> + allowed only down to 0, in wich case, the zone is reset to rewind the file >> + zone write pointer position to the start of the zone, or up to the zone size, >> + in which case the file's zone is transitioned to the FULL state (finish zone >> + operation). >> +5) All read and write operations to files are not allowed beyond the file zone >> + size. Any access exceeding the zone size is failed with the -EFBIG error. >> +6) Creating, deleting, renaming or modifying any attribute of files and >> + sub-directories is not allowed. >> + >> +Several optional features of zonefs can be enabled at format time. >> +* Conventional zone aggregation: ranges of contiguous conventional zones can be >> + agregated into a single larger file instead of the default one file per zone. >> +* File ownership: The owner UID and GID of zone files is by default 0 (root) >> + but can be changed to any valid UID/GID. >> +* File access permissions: the default 640 access permissions can be changed. >> + > > Please mention the 'direct writes only to sequential zones' restriction. Yes, indeed, this is missing. Will add it. > > Cheers, > > Hannes > -- Damien Le Moal Western Digital Research