Re: [PATCH V3] fs: New zonefs file system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/21/19 9:03 AM, Damien Le Moal wrote:
> zonefs is a very simple file system exposing each zone of a zoned
> block device as a file. zonefs is in fact closer to a raw block device
> access interface than to a full feature POSIX file system.
> 
> The goal of zonefs is to simplify implementation of zoned block device
> raw access by applications by allowing switching to the well known POSIX
> file API rather than relying on direct block device file ioctls and
> read/write. Zonefs, for instance, greatly simplifies the implementation
> of LSM (log-structured merge) tree structures (such as used in RocksDB
> and LevelDB) on zoned block devices by allowing SSTables to be stored in
> a zone file similarly to a regular file system architecture, hence
> reducing the amount of change needed in the application.
> 
> Zonefs on-disk metadata is reduced to a super block to store a magic
> number, a uuid and optional features flags and values. On mount, zonefs
> uses blkdev_report_zones() to obtain the device zone configuration and
> populates the mount point with a static file tree solely based on this
> information. E.g. file sizes come from zone write pointer offset managed
> by the device itself.
> 
> The zone files created on mount have the following characteristics.
> 1) Files representing zones of the same type are grouped together
>    under a common directory:
>   * For conventional zones, the directory "cnv" is used.
>   * For sequential write zones, the directory "seq" is used.
>   These two directories are the only directories that exist in zonefs.
>   Users cannot create other directories and cannot rename nor delete
>   the "cnv" and "seq" directories.
> 2) The name of zone files is by default the number of the file within
>    the zone type directory, in order of increasing zone start sector.
> 3) The size of conventional zone files is fixed to the device zone size.
>    Conventional zone files cannot be truncated.
> 4) The size of sequential zone files represent the file zone write
>    pointer position relative to the zone start sector. Truncating these
>    files is allowed only down to 0, in wich case, the zone is reset to
>    rewind the file zone write pointer position to the start of the zone.
> 5) All read and write operations to files are not allowed beyond the
>    file zone size. Any access exceeding the zone size is failed with
>    the -EFBIG error.
> 6) Creating, deleting, renaming or modifying any attribute of files
>    and directories is not allowed. The only exception being the file
>    size of sequential zone files which can be modified by write
>    operations or truncation to 0.
> 
> Several optional features of zonefs can be enabled at format time.
> * Conventional zone aggregation: contiguous conventional zones can be
>   agregated into a single larger file instead of multiple per-zone
>   files.
> * File naming: the default file number file name can be switched to
>   using the base-10 value of the file zone start sector.
> * File ownership: The owner UID and GID of zone files is by default 0
>   (root) but can be changed to any valid UID/GID.
> * File access permissions: the default 640 access permissions can be
>   changed.
> 
> The mkzonefs tool is used to format zonefs. This tool is available
> on Github at: git@xxxxxxxxxx:damien-lemoal/zonefs-tools.git.
> zonefs-tools includes a simple test suite which can be run against any
> zoned block device, including null_blk block device created with zoned
> mode.
> 
> Example: the following formats a host-managed SMR HDD with the
> conventional zone aggregation feature enabled.
> 
> mkzonefs -o aggr_cnv /dev/sdX
> mount -t zonefs /dev/sdX /mnt
> ls -l /mnt/
> total 0
> dr-xr-xr-x 2 root root 0 Apr 11 13:00 cnv
> dr-xr-xr-x 2 root root 0 Apr 11 13:00 seq
> 
> ls -l /mnt/cnv
> total 137363456
> -rw-rw---- 1 root root 140660178944 Apr 11 13:00 0
> 
> ls -Fal -v /mnt/seq
> total 14511243264
> dr-xr-xr-x 2 root root 15942528 Jul 10 11:53 ./
> drwxr-xr-x 4 root root     1152 Jul 10 11:53 ../
> -rw-r----- 1 root root        0 Jul 10 11:53 0
> -rw-r----- 1 root root 33554432 Jul 10 13:43 1
> -rw-r----- 1 root root        0 Jul 10 11:53 2
> -rw-r----- 1 root root        0 Jul 10 11:53 3
> ...
> 
> The aggregated conventional zone file can be used as a regular file.
> Operations such as the following work.
> 
> mkfs.ext4 /mnt/cnv/0
> mount -o loop /mnt/cnv/0 /data
> 
> Contains contributions from Johannes Thumshirn <jthumshirn@xxxxxxx>
> and Christoph Hellwig <hch@xxxxxx>.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
> ---
> Changes from v2:
> * Addressed comments from Darrick: Typo, added checksum to super block,
>   enhance cheks of the super block fields validity (used reserved bytes
>   and unknown features bits)
> * Rebased on XFS tree iomap-for-next branch
> 
> Changes from v1:
> * Rebased on latest iomap branch iomap-5.4-merge of XFS tree at
>   git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
> * Addressed all comments from Dave Chinner and others
> 
>  MAINTAINERS                |   10 +
>  fs/Kconfig                 |    2 +
>  fs/Makefile                |    1 +
>  fs/zonefs/Kconfig          |    9 +
>  fs/zonefs/Makefile         |    4 +
>  fs/zonefs/super.c          | 1083 ++++++++++++++++++++++++++++++++++++
>  fs/zonefs/zonefs.h         |  177 ++++++
>  include/uapi/linux/magic.h |    1 +
>  8 files changed, 1287 insertions(+)
>  create mode 100644 fs/zonefs/Kconfig
>  create mode 100644 fs/zonefs/Makefile
>  create mode 100644 fs/zonefs/super.c
>  create mode 100644 fs/zonefs/zonefs.h
> 
[ .. ]
> @@ -261,6 +262,7 @@ source "fs/romfs/Kconfig"
>  source "fs/pstore/Kconfig"
>  source "fs/sysv/Kconfig"
>  source "fs/ufs/Kconfig"
> +source "fs/ufs/Kconfig"
>  
>  endif # MISC_FILESYSTEMS
>  
Hmm?
Duplicate line?

> diff --git a/fs/Makefile b/fs/Makefile
> index d60089fd689b..7d3c90e1ad79 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -130,3 +130,4 @@ obj-$(CONFIG_F2FS_FS)		+= f2fs/
>  obj-$(CONFIG_CEPH_FS)		+= ceph/
>  obj-$(CONFIG_PSTORE)		+= pstore/
>  obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
> +obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
> diff --git a/fs/zonefs/Kconfig b/fs/zonefs/Kconfig
> new file mode 100644
> index 000000000000..6490547e9763
> --- /dev/null
> +++ b/fs/zonefs/Kconfig
> @@ -0,0 +1,9 @@
> +config ZONEFS_FS
> +	tristate "zonefs filesystem support"
> +	depends on BLOCK
> +	depends on BLK_DEV_ZONED
> +	help
> +	  zonefs is a simple File System which exposes zones of a zoned block
> +	  device as files.
> +
> +	  If unsure, say N.
> diff --git a/fs/zonefs/Makefile b/fs/zonefs/Makefile
> new file mode 100644
> index 000000000000..75a380aa1ae1
> --- /dev/null
> +++ b/fs/zonefs/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_ZONEFS_FS) += zonefs.o
> +
> +zonefs-y	:= super.o
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> new file mode 100644
> index 000000000000..5521c21fd34b
> --- /dev/null
> +++ b/fs/zonefs/super.c
[ .. ]

That whole thing looks good to me (with my limited fs skills :-),
however, some things I'd like to have clarified:

- zone state handling:
While you do have some handling for offline zones, I'm missing a
handling during normal I/O. Surely a zone can go offline via other means
(like the admin calling nasty user-space programs), which then would
result in an I/O error in the filesystem.
Shouldn't we handle this case when doing error handling?
IE shouldn't we look at the zone state when doing a REPORT ZONES, and
update it if required?
Similarly: How do we present zones which are not accessible? Will they
still show up in the directory? I think they should, but we should be
returning an error to userspace like EPERM or somesuch.

- zone sizes:
>From what I've seen sequential zones can be appended to, ie they'll
start off at 0 and will increase in size. Conventional zones, OTOH,
apparently always have a fixed size. Is that correct?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@xxxxxxx			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux