On Wed, Feb 05, 2020 at 09:08:37PM +0900, Damien Le Moal wrote: > Add the new file Documentation/filesystems/zonefs.txt to document > zonefs principles and user-space tool usage. > > Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> ..... Just looking at the error handling text... > +Several optional features of zonefs can be enabled at format time. > +* Conventional zone aggregation: ranges of contiguous conventional zones can be > + aggregated into a single larger file instead of the default one file per zone. > +* File ownership: The owner UID and GID of zone files is by default 0 (root) > + but can be changed to any valid UID/GID. > +* File access permissions: the default 640 access permissions can be changed. > + > +zonefs mount options > +-------------------- This section is really all about error handling, not so much mount options in general... > + > +zonefs defines several mount options allowing the user to control the file > +system behavior when write I/O errors occur and when inconsistencies between a > +file size and its zone write pointer position are discovered. The handling of > +read I/O errors is not changed by these options as long as no inode size > +corruption is detected. > + > +These options are as follows. > +* errors=remount-ro (default) > + All write IO errors and errors due to a zone of the device going "bad" > + (condition changed to offline or read-only), the file system is remounted > + read-only after fixing the size and access permissions of the inode that > + suffered the IO error. What does "fixing the size and access permissions of the inode" mean? > +* errors=zone-ro > + Any write IO error to a file zone result in the zone being considered as in a > + read-only condition, preventing any further modification to the file. This > + option does not affect the handling of errors due to offline zones. For these > + zones, all accesses (read and write) are disabled. If the zone is marked RO, then shouldn't reads still work?. Oh, hold on, you're now talking about errors that take the zone oflfine at the device level? Perhaps a table describing what IO can be done to a zone vs the device once an error occurs would be a clearer way of describing the behaviour. It seems to me that a table might be a better way of decribing all the different conditions Post error access permissions zone device mountopt zone state read write read write -------- ---------- ---- ----- ---- ----- remount-ro good yes no yes no RO yes no yes no Offline no no yes no zone-ro good yes no yes yes RO yes no yes yes Offline no no yes yes zone-offline good no no yes yes RO no no yes yes Offline no no yes yes repair good yes yes yes yes RO yes no yes yes Offline no no yes yes And then you can document that an offline zone will always appear to have a size of 0, be immutable, etc, while a read-only zone will have a size that reflects the amount of valid data in the zone that can be read. IOWs, you don't need to mix the definitions of zone state appearence and behaviour with descriptions of what actions the different mount options take when a write error occurs. > +* errors=zone-offline > + Any write IO error to a file zone result in the zone being considered as in > + an offline condition. This implies that the file size is changed to 0 and all > + read/write accesses to the file disabled, preventing all accesses by the user. > +* errors=repair > + Any inconsistency between an inode size and its zone amount of written data > + due to IO errors or external corruption are fixed without any change to file > + access rights. This option does not affect the processing of zones that were > + signaled as read-only or offline by the device. For read-only zones, the file > + read accesses are disabled and for offline zones, all access permissions are > + removed. > + > +For sequential zone files, inconsistencies between an inode size and the amount > +of data writen in its zone, that is, the position of the file zone write > +pointer, can result from different events: > +* When the device write cache is enabled, a differed write error can occur "a different write error"? > + resulting in the amount of data written in the zone being less than the inode > + size. Though I suspect that what you really mean to say is that errors can occur in ranges of previously completed writes can occur when the cache is flushed, hence less data being physically written than the OS has previously be told was written by the hardware. i.e. visible inode size goes backwards. > +* Partial failures of large write I/O operations (e.g. one BIO of a multi-bio > + large direct write fails) can result in the amount of data written in the > + zone being larger than the inode size. > +* External action on the disk such as write, zone reset or zone finish > + operations will change a file zone write pointer position resulting in a > + reported amount of written data being different from the file inode size. *nod* > +Finally, defective drives may change the condition of any zone to offline (zone > +dead) or read-only. Such changes, when discovered with the IO errors they can > +cause, are handled automatically regardless of the options specified at mount > +time. For offline zones, the action taken is similar to the action defined by > +the errors=zone-offline mount option. For read-only zones, the action used is > +as defined by the errors=zone-ro mount option. Hmmmm. I think that's over-complicating things and takes control of error handling away from the user. That is, regardless of the reason for the error, if we get a write error and the user specified errors=remount-ro, the entire device should go read-only because that's what the user has told the filesystem to do on write error. This seems pretty user-unfriendly - giving them a way to control error handling behaviour and then ignoring it for specific errors despite the fact they mean exactly the same thing to the user: the write failed because a zone has gone bad since the last time that zone was accessed by the application.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx