Hi Damien, Typo etc. corrections below: On 2/6/20 7:16 PM, Damien Le Moal wrote: > Add the new file Documentation/filesystems/zonefs.txt to document > zonefs principles and user-space tool usage. > > Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > Documentation/filesystems/zonefs.txt | 404 +++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 2 files changed, 405 insertions(+) > create mode 100644 Documentation/filesystems/zonefs.txt > > diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt > new file mode 100644 > index 000000000000..935bf22031ca > --- /dev/null > +++ b/Documentation/filesystems/zonefs.txt > @@ -0,0 +1,404 @@ > +ZoneFS - Zone filesystem for Zoned block devices > + > +Introduction > +============ > + ... > + > +Zoned block devices > +------------------- > + ... > + > +Zonefs Overview > +=============== > + ... > + > +On-disk metadata > +---------------- > + ... > + > +Zone type sub-directories > +------------------------- > + ... > + > +Zone files > +---------- > + ... > + > +Conventional zone files > +----------------------- > + ... > + > +Sequential zone files > +--------------------- > + > +The size of sequential zone files grouped in the "seq" sub-directory represents > +the file's zone write pointer position relative to the zone start sector. > + > +Sequential zone files can only be written sequentially, starting from the file > +end, that is, write operations can only be append writes. Zonefs makes no > +attempt at accepting random writes and will fail any write request that has a > +start offset not corresponding to the end of the file, or to the end of the last > +write issued and still in-flight (for asynchrnous I/O operations). asynchronous > + > +Since dirty page writeback by the page cache does not guarantee a sequential > +write pattern, zonefs prevents buffered writes and writeable shared mappings > +on sequential files. Only direct I/O writes are accepted for these files. > +zonefs relies on the sequential delivery of write I/O requests to the device > +implemented by the block layer elevator. An elevator implementing the sequential > +write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature) > +must be used. This type of elevator (e.g. mq-deadline) is the set by default is set by default > +for zoned block devices on device initialization. > + ... > + > +Format options > +-------------- > + ... > + > +IO error handling > +----------------- > + ... > + > + > +* Unaligned write errors: These errors result from the host issuing write > + requests with a start sector that does not correspond to a zone write pointer > + position when the write request is executed by the device. Even though zonefs > + enforces sequential file write for sequential zones, unaligned write errors > + may still happen in the case of a partial failure of a very large direct I/O > + operation split into multiple BIOs/requests or asynchronous I/O operations. > + If one of the write request within the set of sequential write requests > + issued to the device fails, all write requests after queued after it will requests queued after it > + become unaligned and fail. > + ... > + > +All I/O errors detected by zonefs are notified to the user with an error code > +return for the system call that trigered or detected the error. The recovery triggered > +actions taken by zonefs in response to I/O errors depend on the I/O type (read > +vs write) and on the reason for the error (bad sector, unaligned writes or zone > +condition change). > + ... > + > +Zonefs minimal I/O error recovery may change a file size and a file access and file access > +permissions. > + > +* File size changes: > + Immediate or delayed write errors in a sequential zone file may cause the file > + inode size to be inconsistent with the amount of data successfully written in > + the file zone. For instance, the partial failure of a multi-BIO large write > + operation will cause the zone write pointer to advance partially, even though > + the entire write operation will be reported as failed to the user. In such > + case, the file inode size must be advanced to reflect the zone write pointer > + change and eventually allow the user to restart writing at the end of the > + file. > + A file size may also be reduced to reflect a delayed write error detected on > + fsync(): in this case, the amount of data effectively written in the zone may > + be less than originally indicated by the file inode size. After such I/O > + error, zonefs always fixes a file inode size to reflect the amount of data fixes the file inode size > + persistently stored in the file zone. > + > +* Access permission changes: ... > + > +Further notes: > +* The "errors=remount-ro" mount option is the default behavior of zonefs I/O > + error processing if no errors mount option is specified. > +* With the "errors=remount-ro" mount option, the change of the file access > + permissions to read-only applies to all files. The file system is remounted > + read-only. > +* Access permission and file size changes due to the device transitioning zones > + to the offline condition are permanent. Remounting or reformating the device usually: reformatting > + with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good > + state. > +* File access permission changes to read-only due to the device transitioning > + zones to the read-only condition are permanent. Remounting or reformating reformatting > + the device will not re-enable file write access. > +* File access permission changes implied by the remount-ro, zone-ro and > + zone-offline mount options are temporary for zones in a good condition. > + Unmounting and remounting the file system will restore the previous default > + (format time values) access rights to the files affected. > +* The repair mount option triggers only the minimal set of I/O error recovery > + actions, that is, file size fixes for zones in a good condition. Zones > + indicated as being read-only or offline by the device still imply changes to > + the zone file access permissions as noted in the table above. > + > +Mount options > +------------- > + > +zonefs define the "errors=<behavior>" mount option to allow the user to specify > +zonefs behavior in response to I/O errors, inode size inconsistencies or zone > +condition chages. The defined behaviors are as follow: changes. > +* remount-ro (default) > +* zone-ro > +* zone-offline > +* repair > + > +The I/O error actions defined for each behavior is detailed in the previous are > +section. > + > +Zonefs User Space Tools > +======================= > + ... > + > +Examples > +-------- > + ... HTH. -- ~Randy