On 12/12/19 7:38 PM, Damien Le Moal wrote:
Add the new file Documentation/filesystems/zonefs.txt to document zonefs principles and user-space tool usage. Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> --- Documentation/filesystems/zonefs.txt | 150 +++++++++++++++++++++++++++ MAINTAINERS | 1 + 2 files changed, 151 insertions(+) create mode 100644 Documentation/filesystems/zonefs.txt diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt new file mode 100644 index 000000000000..e5d798f4087d --- /dev/null +++ b/Documentation/filesystems/zonefs.txt @@ -0,0 +1,150 @@ +ZoneFS - Zone filesystem for Zoned block devices + +Overview +======== + +zonefs is a very simple file system exposing each zone of a zoned block device +as a file. Unlike a regular file system with zoned block device support (e.g. +f2fs), zonefs does not hide the sequential write constraint of zoned block +devices to the user. Files representing sequential write zones of the device +must be written sequentially starting from the end of the file (append only +writes). + +As such, zonefs is in essence closer to a raw block device access interface +than to a full featured POSIX file system. The goal of zonefs is to simplify +the implementation of zoned block devices support in applications by replacing +raw block device file accesses with a richer file API, avoiding relying on +direct block device file ioctls which may be more obscure to developers. One +example of this approach is the implementation of LSM (log-structured merge) +tree structures (such as used in RocksDB and LevelDB) on zoned block devices by +allowing SSTables to be stored in a zone file similarly to a regular file system +rather than as a range of sectors of the entire disk. The introduction of the +higher level construct "one file is one zone" can help reducing the amount of +changes needed in the application as well as introducing support for different +application programming languages. + +zonefs on-disk metadata is reduced to a super block which persistently stores a +magic number and optional features flags and values. On mount, zonefs uses +blkdev_report_zones() to obtain the device zone configuration and populates +the mount point with a static file tree solely based on this information. +E.g. file sizes come from the device zone type and write pointer offset managed +by the device itself. + +The zone files created on mount have the following characteristics. +1) Files representing zones of the same type are grouped together + under the same sub-directory: + * For conventional zones, the sub-directory "cnv" is used. + * For sequential write zones, the sub-directory "seq" is used. + These two directories are the only directories that exist in zonefs. Users + cannot create other directories and cannot rename nor delete the "cnv" and + "seq" sub-directories. +2) The name of zone files is the number of the file within the zone type + sub-directory, in order of increasing zone start sector. +3) The size of conventional zone files is fixed to the device zone size. + Conventional zone files cannot be truncated. +4) The size of sequential zone files represent the file's zone write pointer + position relative to the zone start sector. Truncating these files is + allowed only down to 0, in wich case, the zone is reset to rewind the file + zone write pointer position to the start of the zone, or up to the zone size, + in which case the file's zone is transitioned to the FULL state (finish zone + operation). +5) All read and write operations to files are not allowed beyond the file zone + size. Any access exceeding the zone size is failed with the -EFBIG error. +6) Creating, deleting, renaming or modifying any attribute of files and + sub-directories is not allowed. + +Several optional features of zonefs can be enabled at format time. +* Conventional zone aggregation: ranges of contiguous conventional zones can be + agregated into a single larger file instead of the default one file per zone. +* File ownership: The owner UID and GID of zone files is by default 0 (root) + but can be changed to any valid UID/GID. +* File access permissions: the default 640 access permissions can be changed. +
Please mention the 'direct writes only to sequential zones' restriction. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer