Re: [PATCH] xfs: Document error handling behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 19, 2016 at 12:04:17PM +0200, Carlos Maiolino wrote:
> This is the first try to document the implementation of error handlers into
> sysfs.
> 
> Reviews and comments are appreciated, please also notice I'm not english-native,
> so, spelling corrections are also appreciated :)
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
> ---
>  Documentation/filesystems/xfs.txt | 78 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 78 insertions(+)
> 
> diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
> index 8146e9f..1df868a 100644
> --- a/Documentation/filesystems/xfs.txt
> +++ b/Documentation/filesystems/xfs.txt
> @@ -348,3 +348,81 @@ Removed Sysctls
>    ----				-------
>    fs.xfs.xfsbufd_centisec	v4.0
>    fs.xfs.age_buffer_centisecs	v4.0
> +
> +Error handling
> +==============
> +
> +XFS can act differently according with the type of error found
> +during its operation. The implementation introduces the following
> +concepts to the error handler:
> +
> + -failure speed:
> +	Defines how fast XFS should shutdown in case of a specific
> +	error is found during the filesystem  operation. It can
> +	shutdown immediately, after a defined number of tries, or
> +	simply try forever, which was the old behavior and is now
> +	set as default behavior, except during unmount time, where
> +	in case of a error is found while unmounting, the filesystem
> +	will shutdown.
> +
> + -error classes:
> +	Specifies the subsystem/location where the error handlers
> +	configure the behavior for, such as metadata or memory allocation.
> +
> + -error handlers:
> +	Defines the behavior for a specific error.
> +
> +The filesystem behavior during an error can be set via sysfs files, where, the
> +errors are organized with the following structure:
> +
> +  /sys/fs/xfs/<dev>/error/<class>/<error>/
> +
> +Each directory contains:
> +
> + /sys/fs/xfs/<dev>/error/
> +
> +	fail_at_unmount		(Min:  0  Default:  1  Max: 1)
> +		Defines the global error behavior during unmount time. If set to
> +		"1", XFS will shutdown in case of any error is found, otherwise,
> +		if set to "0", the filesystem will indefinitely retry to cleanly
> +		unmount the filesystem.

Hi Carlos,

Could you explain more about the relationship of fail_at_unmount and
max_retries(/retry_timeout_seconds). For example, if I set fail_at_unmount=0,
and set EIO/max_retries=1, what's expected?

I'd like to write test case about this error handling, according to
your document.

Thanks,
Zorro

> +
> +	<class> subdirectories
> +		Contains specific error handlers configuration
> +		(Ex: /sys/fs/xfs/<dev>/error/metadata).
> +
> + /sys/fs/xfs/<dev>/error/<class>/
> +
> +	The contents of this directory are <class> specific, since each <class>
> +	might need to handle different types of errors. All <error> directory
> +	though, contains the "default" directory, which is a global configuration
> +	for errors not available for independent configuration.
> +
> + /sys/fs/xfs/<dev>/error/<class>/<error>
> +
> +	Contains the failure speed configuration files for each specific error,
> +	including the "default" behavior, which contains the same configuration
> +	options as the specific errors.
> +
> +	The available configurations for each error type are:
> +
> +	max_retries			(Min: -1  Default: -1  Max: INTMAX)
> +		Define how many tries the filesystem is allowed to retry its
> +		operations during the specific error, before shutdown the
> +		filesystem. Setting this file to "-1", will set XFS to retry
> +		forever in the specific error, setting it to "0", will make
> +		XFS to fail immediately after the specific error is found,
> +		while setting it to a "N" value, where N is greater than 0,
> +		will make XFS retry "N" times before shutdown.
> +
> +	retry_timeout_seconds		(Min:  0  Default:  0  Max: INTMAX)
> +		Define the amount of time (in seconds) that the filesystem is
> +		allowed to retry its operations when the specific error is
> +		found. "0" means no wait time.
> +
> +
> +	"max_retries" takes precedence over "retry_timeout_seconds", where,
> +	"retry_timeout_seconds" will only be tested if the "max_retries" limit
> +	were not reached yet or is set to retry forever ("-1"). If "max_retries"
> +	limit is reached, the filesystem will shutdown, wether or not
> +	"retry_timeout_seconds" has been reached.
> -- 
> 2.7.4
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux