Hi folks, any comments on this? Cheers On Tue, Aug 09, 2016 at 05:15:24AM -0400, Carlos Maiolino wrote: > Document the implementation of error handlers into sysfs. > > Changelog: > > V2: > - Add a description of the precedence order of each option, focusing on > the behavior of "fail_at_unmount" which was not well explained in V1 > > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx> > --- > Documentation/filesystems/xfs.txt | 94 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 94 insertions(+) > > diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt > index 8146e9f..d483e0b 100644 > --- a/Documentation/filesystems/xfs.txt > +++ b/Documentation/filesystems/xfs.txt > @@ -348,3 +348,97 @@ Removed Sysctls > ---- ------- > fs.xfs.xfsbufd_centisec v4.0 > fs.xfs.age_buffer_centisecs v4.0 > + > +Error handling > +============== > + > +XFS can act differently according with the type of error found > +during its operation. The implementation introduces the following > +concepts to the error handler: > + > + -failure speed: > + Defines how fast XFS should shutdown in case of a specific > + error is found during the filesystem operation. It can > + shutdown immediately, after a defined number of tries, or > + simply try forever, which was the old behavior and is now > + set as default behavior, except during unmount time, where > + in case of a error is found while unmounting, the filesystem > + will shutdown. > + > + -error classes: > + Specifies the subsystem/location where the error handlers > + configure the behavior for, such as metadata or memory allocation. > + > + -error handlers: > + Defines the behavior for a specific error. > + > +The filesystem behavior during an error can be set via sysfs files, where, the > +errors are organized with the following structure: > + > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > + > +Each directory contains: > + > + /sys/fs/xfs/<dev>/error/ > + > + fail_at_unmount (Min: 0 Default: 1 Max: 1) > + Defines the global error behavior during unmount time. If set to > + "1", XFS will shutdown in case of any error is found, otherwise, > + if set to "0", the filesystem will indefinitely retry to cleanly > + unmount the filesystem. > + > + <class> subdirectories > + Contains specific error handlers configuration > + (Ex: /sys/fs/xfs/<dev>/error/metadata). > + > + /sys/fs/xfs/<dev>/error/<class>/ > + > + The contents of this directory are <class> specific, since each <class> > + might need to handle different types of errors. All <error> directory > + though, contains the "default" directory, which is a global configuration > + for errors not available for independent configuration. > + > + /sys/fs/xfs/<dev>/error/<class>/<error> > + > + Contains the failure speed configuration files for each specific error, > + including the "default" behavior, which contains the same configuration > + options as the specific errors. > + > + The available configurations for each error type are: > + > + max_retries (Min: -1 Default: -1 Max: INTMAX) > + Define how many tries the filesystem is allowed to retry its > + operations during the specific error, before shutdown the > + filesystem. Setting this file to "-1", will set XFS to retry > + forever in the specific error, setting it to "0", will make > + XFS to fail immediately after the specific error is found, > + while setting it to a "N" value, where N is greater than 0, > + will make XFS retry "N" times before shutdown. > + > + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) > + Define the amount of time (in seconds) that the filesystem is > + allowed to retry its operations when the specific error is > + found. "0" means no wait time. > + > + > + > + Order of precedence: > + "max_retries" takes precedence over "retry_timeout_seconds", > + where, "retry_timeout_seconds" will only be tested if > + "max_retries" limit was not reached yet or is set to retry > + forever ("-1"). If "max_retries" limit is reached, the > + filesystem will shutdown, wether or not "retry_timeout_seconds" > + has been reached. > + > + "fail_at_unmount" on the other hand, works independently of the > + remainder options. It will only be tested during unmount time, > + but, it will shutdown the filesystem independent of the limits > + set into "max_retries" or "retry_timeout_seconds". > + It has been added because sysfs configuration can't be changed > + after an unmount is triggered, once the sysfs directory from > + the filesystem being unmounted will be detached from the sysfs > + tree, so, even if the sysadmin wants to make XFS retry forever > + for any error during the filesystem operation, the filesystem > + can still be properly unmounted if any error was detected and > + "fail_at_unmount" is set. Otherwise, the umount process get > + stuck forever. > -- > 2.5.5 > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs -- Carlos _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs