On Fri, Jul 22, 2016 at 12:09:55PM +0800, Zorro Lang wrote: > On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote: > > This is the first try to document the implementation of error handlers into > > sysfs. > > > > Reviews and comments are appreciated, please also notice I'm not english-native, > > so, spelling corrections are also appreciated :) > > > > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx> > > --- > > Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 78 insertions(+) > > > > diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt > > index 8146e9f..1df868a 100644 > > --- a/Documentation/filesystems/xfs.txt > > +++ b/Documentation/filesystems/xfs.txt > > @@ -348,3 +348,81 @@ Removed Sysctls > > ---- ------- > > fs.xfs.xfsbufd_centisec v4.0 > > fs.xfs.age_buffer_centisecs v4.0 > > + > > +Error handling > > +============== > > + > > +XFS can act differently according with the type of error found > > +during its operation. The implementation introduces the following > > +concepts to the error handler: > > + > > + -failure speed: > > + Defines how fast XFS should shutdown in case of a specific > > + error is found during the filesystem operation. It can > > + shutdown immediately, after a defined number of tries, or > > + simply try forever, which was the old behavior and is now > > + set as default behavior, except during unmount time, where > > + in case of a error is found while unmounting, the filesystem > > + will shutdown. > > + > > + -error classes: > > + Specifies the subsystem/location where the error handlers > > + configure the behavior for, such as metadata or memory allocation. > > + > > + -error handlers: > > + Defines the behavior for a specific error. > > + > > +The filesystem behavior during an error can be set via sysfs files, where, the > > +errors are organized with the following structure: > > + > > + /sys/fs/xfs/<dev>/error/<class>/<error>/ > > + > > +Each directory contains: > > + > > + /sys/fs/xfs/<dev>/error/ > > + > > + fail_at_unmount (Min: 0 Default: 1 Max: 1) > > + Defines the global error behavior during unmount time. If set to > > + "1", XFS will shutdown in case of any error is found, otherwise, > > + if set to "0", the filesystem will indefinitely retry to cleanly > > + unmount the filesystem. > > Hi Carlos, > > Could you explain more about the relationship of fail_at_unmount and > max_retries(/retry_timeout_seconds). For example, if I set fail_at_unmount=0, > and set EIO/max_retries=1, what's expected? > They are different options, if max_retries is set to 1, it will fail after the first try as expected, even if during unmount, and even if fail_at_unmount = 0. The problem, and the reason for us to have added fail_at_unmount, is that, you can't change any configuration after umount is issued, because the sysfs directory for the device being unmounted will be detached from sysfs, so, if the sysadmin wants to make XFS retry forever for any error during the filesystem operation, he is still able to unmount the filesystem "properly" (since, if the FS find errors, it might not be a clean mount) if he sets fail_at_unmount, otherwise, he might have umount process stuck forever. > I'd like to write test case about this error handling, according to > your document. > > Thanks, > Zorro > > > + > > + <class> subdirectories > > + Contains specific error handlers configuration > > + (Ex: /sys/fs/xfs/<dev>/error/metadata). > > + > > + /sys/fs/xfs/<dev>/error/<class>/ > > + > > + The contents of this directory are <class> specific, since each <class> > > + might need to handle different types of errors. All <error> directory > > + though, contains the "default" directory, which is a global configuration > > + for errors not available for independent configuration. > > + > > + /sys/fs/xfs/<dev>/error/<class>/<error> > > + > > + Contains the failure speed configuration files for each specific error, > > + including the "default" behavior, which contains the same configuration > > + options as the specific errors. > > + > > + The available configurations for each error type are: > > + > > + max_retries (Min: -1 Default: -1 Max: INTMAX) > > + Define how many tries the filesystem is allowed to retry its > > + operations during the specific error, before shutdown the > > + filesystem. Setting this file to "-1", will set XFS to retry > > + forever in the specific error, setting it to "0", will make > > + XFS to fail immediately after the specific error is found, > > + while setting it to a "N" value, where N is greater than 0, > > + will make XFS retry "N" times before shutdown. > > + > > + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX) > > + Define the amount of time (in seconds) that the filesystem is > > + allowed to retry its operations when the specific error is > > + found. "0" means no wait time. > > + > > + > > + "max_retries" takes precedence over "retry_timeout_seconds", where, > > + "retry_timeout_seconds" will only be tested if the "max_retries" limit > > + were not reached yet or is set to retry forever ("-1"). If "max_retries" > > + limit is reached, the filesystem will shutdown, wether or not > > + "retry_timeout_seconds" has been reached. > > -- > > 2.7.4 > > > > _______________________________________________ > > xfs mailing list > > xfs@xxxxxxxxxxx > > http://oss.sgi.com/mailman/listinfo/xfs -- Carlos _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs