Re: [PATCH v13 2/2] zonefs: Add documentation

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Thu, 20 Feb 2020 00:59:29 +0000

On 2020/02/20 9:55, Randy Dunlap wrote:
> Hi Damien,
> 
> Typo etc. corrections below:

Thanks. Will correct these. Since this is now in the kernel, you can send a
patch too :)

> 
> On 2/6/20 7:16 PM, Damien Le Moal wrote:
>> Add the new file Documentation/filesystems/zonefs.txt to document
>> zonefs principles and user-space tool usage.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
>> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
>> ---
>>  Documentation/filesystems/zonefs.txt | 404 +++++++++++++++++++++++++++
>>  MAINTAINERS                          |   1 +
>>  2 files changed, 405 insertions(+)
>>  create mode 100644 Documentation/filesystems/zonefs.txt
>>
>> diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt
>> new file mode 100644
>> index 000000000000..935bf22031ca
>> --- /dev/null
>> +++ b/Documentation/filesystems/zonefs.txt
>> @@ -0,0 +1,404 @@
>> +ZoneFS - Zone filesystem for Zoned block devices
>> +
>> +Introduction
>> +============
>> +
> ...
>> +
>> +Zoned block devices
>> +-------------------
>> +
> ...
>> +
>> +Zonefs Overview
>> +===============
>> +
> ...
> 
>> +
>> +On-disk metadata
>> +----------------
>> +
> ...
> 
>> +
>> +Zone type sub-directories
>> +-------------------------
>> +
> ...
> 
>> +
>> +Zone files
>> +----------
>> +
> ...
> 
>> +
>> +Conventional zone files
>> +-----------------------
>> +
> ...
> 
>> +
>> +Sequential zone files
>> +---------------------
>> +
>> +The size of sequential zone files grouped in the "seq" sub-directory represents
>> +the file's zone write pointer position relative to the zone start sector.
>> +
>> +Sequential zone files can only be written sequentially, starting from the file
>> +end, that is, write operations can only be append writes. Zonefs makes no
>> +attempt at accepting random writes and will fail any write request that has a
>> +start offset not corresponding to the end of the file, or to the end of the last
>> +write issued and still in-flight (for asynchrnous I/O operations).
>                                          asynchronous
> 
>> +
>> +Since dirty page writeback by the page cache does not guarantee a sequential
>> +write pattern, zonefs prevents buffered writes and writeable shared mappings
>> +on sequential files. Only direct I/O writes are accepted for these files.
>> +zonefs relies on the sequential delivery of write I/O requests to the device
>> +implemented by the block layer elevator. An elevator implementing the sequential
>> +write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
>> +must be used. This type of elevator (e.g. mq-deadline) is the set by default
> 
>                                                           is set by default
> 
>> +for zoned block devices on device initialization.
>> +
> ...
> 
>> +
>> +Format options
>> +--------------
>> +
> ...
> 
>> +
>> +IO error handling
>> +-----------------
>> +
> ...
> 
>> +
>> +
>> +* Unaligned write errors: These errors result from the host issuing write
>> +  requests with a start sector that does not correspond to a zone write pointer
>> +  position when the write request is executed by the device. Even though zonefs
>> +  enforces sequential file write for sequential zones, unaligned write errors
>> +  may still happen in the case of a partial failure of a very large direct I/O
>> +  operation split into multiple BIOs/requests or asynchronous I/O operations.
>> +  If one of the write request within the set of sequential write requests
>> +  issued to the device fails, all write requests after queued after it will
> 
>                                            requests queued after it
> 
>> +  become unaligned and fail.
>> +
> ...
> 
>> +
>> +All I/O errors detected by zonefs are notified to the user with an error code
>> +return for the system call that trigered or detected the error. The recovery
> 
>                                    triggered
> 
>> +actions taken by zonefs in response to I/O errors depend on the I/O type (read
>> +vs write) and on the reason for the error (bad sector, unaligned writes or zone
>> +condition change).
>> +
> ...
> 
>> +
>> +Zonefs minimal I/O error recovery may change a file size and a file access
> 
>                                                             and file access
> 
>> +permissions.
>> +
>> +* File size changes:
>> +  Immediate or delayed write errors in a sequential zone file may cause the file
>> +  inode size to be inconsistent with the amount of data successfully written in
>> +  the file zone. For instance, the partial failure of a multi-BIO large write
>> +  operation will cause the zone write pointer to advance partially, even though
>> +  the entire write operation will be reported as failed to the user. In such
>> +  case, the file inode size must be advanced to reflect the zone write pointer
>> +  change and eventually allow the user to restart writing at the end of the
>> +  file.
>> +  A file size may also be reduced to reflect a delayed write error detected on
>> +  fsync(): in this case, the amount of data effectively written in the zone may
>> +  be less than originally indicated by the file inode size. After such I/O
>> +  error, zonefs always fixes a file inode size to reflect the amount of data
> 
>                           fixes the file inode size
> 
>> +  persistently stored in the file zone.
>> +
>> +* Access permission changes:
> ...
> 
>> +
>> +Further notes:
>> +* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
>> +  error processing if no errors mount option is specified.
>> +* With the "errors=remount-ro" mount option, the change of the file access
>> +  permissions to read-only applies to all files. The file system is remounted
>> +  read-only.
>> +* Access permission and file size changes due to the device transitioning zones
>> +  to the offline condition are permanent. Remounting or reformating the device
> 
>                                              usually:      reformatting
> 
>> +  with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good
>> +  state.
>> +* File access permission changes to read-only due to the device transitioning
>> +  zones to the read-only condition are permanent. Remounting or reformating
> 
>                                                                    reformatting
> 
>> +  the device will not re-enable file write access.
>> +* File access permission changes implied by the remount-ro, zone-ro and
>> +  zone-offline mount options are temporary for zones in a good condition.
>> +  Unmounting and remounting the file system will restore the previous default
>> +  (format time values) access rights to the files affected.
>> +* The repair mount option triggers only the minimal set of I/O error recovery
>> +  actions, that is, file size fixes for zones in a good condition. Zones
>> +  indicated as being read-only or offline by the device still imply changes to
>> +  the zone file access permissions as noted in the table above.
>> +
>> +Mount options
>> +-------------
>> +
>> +zonefs define the "errors=<behavior>" mount option to allow the user to specify
>> +zonefs behavior in response to I/O errors, inode size inconsistencies or zone
>> +condition chages. The defined behaviors are as follow:
> 
>              changes.
> 
>> +* remount-ro (default)
>> +* zone-ro
>> +* zone-offline
>> +* repair
>> +
>> +The I/O error actions defined for each behavior is detailed in the previous
> 
>                                                    are
> 
>> +section.
>> +
>> +Zonefs User Space Tools
>> +=======================
>> +
> ...
>> +
>> +Examples
>> +--------
>> +
> ...
> 
> 
> HTH.
> 

-- 
Damien Le Moal
Western Digital Research