On 11/17/17 1:57 PM, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > Document the XFS-specific metadata scrub/repair ioctl's behavior, > arguments, and side effects. > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > man/Makefile | 2 > man/man2/Makefile | 22 +++ > man/man2/ioctl_xfs_scrub_metadata.2 | 298 +++++++++++++++++++++++++++++++++++ > 3 files changed, 321 insertions(+), 1 deletion(-) > create mode 100644 man/man2/Makefile > create mode 100644 man/man2/ioctl_xfs_scrub_metadata.2 > > > diff --git a/man/Makefile b/man/Makefile > index 863284c..cae891f 100644 > --- a/man/Makefile > +++ b/man/Makefile > @@ -5,7 +5,7 @@ > TOPDIR = .. > include $(TOPDIR)/include/builddefs > > -SUBDIRS = man3 man5 man8 > +SUBDIRS = man2 man3 man5 man8 > > default : $(SUBDIRS) > > diff --git a/man/man2/Makefile b/man/man2/Makefile > new file mode 100644 > index 0000000..8aecde3 > --- /dev/null > +++ b/man/man2/Makefile > @@ -0,0 +1,22 @@ > +# > +# Copyright (c) 2000-2001 Silicon Graphics, Inc. All Rights Reserved. > +# > + > +TOPDIR = ../.. > +include $(TOPDIR)/include/builddefs > + > +MAN_SECTION = 2 > + > +MAN_PAGES = $(shell echo *.$(MAN_SECTION)) > +MAN_DEST = $(PKG_MAN_DIR)/man$(MAN_SECTION) > +LSRCFILES = $(MAN_PAGES) > + > +default : $(MAN_PAGES) > + > +include $(BUILDRULES) > + > +install : > + > +install-dev : default > + $(INSTALL) -m 755 -d $(MAN_DEST) > + $(INSTALL_MAN) > diff --git a/man/man2/ioctl_xfs_scrub_metadata.2 b/man/man2/ioctl_xfs_scrub_metadata.2 > new file mode 100644 > index 0000000..fa1c56d > --- /dev/null > +++ b/man/man2/ioctl_xfs_scrub_metadata.2 > @@ -0,0 +1,298 @@ > +.\" Copyright (c) 2017, Oracle. All rights reserved. > +.\" > +.\" %%%LICENSE_START(GPLv2+_DOC_FULL) > +.\" This is free documentation; you can redistribute it and/or > +.\" modify it under the terms of the GNU General Public License as > +.\" published by the Free Software Foundation; either version 2 of > +.\" the License, or (at your option) any later version. > +.\" > +.\" The GNU General Public License's references to "object code" > +.\" and "executables" are to be interpreted as the output of any > +.\" document formatting or typesetting system, including > +.\" intermediate and printed output. > +.\" > +.\" This manual is distributed in the hope that it will be useful, > +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of > +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +.\" GNU General Public License for more details. > +.\" > +.\" You should have received a copy of the GNU General Public > +.\" License along with this manual; if not, see > +.\" <http://www.gnu.org/licenses/>. > +.\" %%%LICENSE_END > +.TH IOCTL-XFS-SCRUB-METADATA 2 2017-09-21 "Linux" "Linux Programmer's Manual" This gets so long it doens't fit for me: IOCTL-XFS-SCRUB-METADATA(2)Linux Programmer’s ManuaIOCTL-XFS-SCRUB-METADATA(2) would just "xfs_scrub_metadata" be better? (drop the ioctl-?) > +.SH NAME > +ioctl_xfs_scrub_metadata \- check XFS filesystem metadata if so then here too > +.SH SYNOPSIS > +.br > +.B #include <xfs/xfs_fs.h> > +.PP > +.BI "int ioctl(int " dest_fd ", XFS_IOC_SCRUB_METADATA, struct xfs_scrub_metadata *" arg ); > +.SH DESCRIPTION > +This XFS ioctl asks the kernel driver to examine a piece of metadata for > +errors or suboptimal metadata. This ioctl asks the xfs kernel driver to examine a piece of filesystem metadata ... > +Examination includes running the metadata verifiers, checking records (total editor nitpick: drop "the"?) > +for obviously incorrect or impossible values, and cross-referencing each > +record with any other available metadata in the filesystem. > +This ioctl can also try to repair or optimize metadata, though this may > +tie up the filesystem for a long period of time. "tie up" is pretty colloquial. "Render the filesystem unavailable?" "Block normal filesystem operations?" > +The type and location of the metadata the metadata to scrub is conveyed in a structure of the > +following form: > +.PP > +.in +4n > +.EX > +struct xfs_scrub_metadata { > + __u32 sm_type; > + __u32 sm_flags; > + __u64 sm_ino; > + __u32 sm_gen; > + __u32 sm_agno; > + __u64 sm_reserved[5]; > +}; > +.EE this doesn't format properly on my rhel6 box but does on my rhel7 box ...? weird. This works for me, modeled on stat(2): .nf struct xfs_scrub_metadata { __u32 sm_type; __u32 sm_flags; __u64 sm_ino; __u32 sm_gen; __u32 sm_agno; __u64 sm_reserved[5]; }; .fi > +.in > +.PP > +The field > +.I sm_reserved > +must be zero. > +.PP > +The field > +.I sm_type > +indicates the type of metadata to check: > +.RS 0.4i > +.TP > +.B XFS_SCRUB_TYPE_PROBE > +Probe the kernel to see if it is willing to try to check or repair this > +filesystem. > +If any > +.B sm_flags > +output flags are set in > +.BR sm_gen ", " > +they will be copied to > +.B sm_flags > +before the call returns. Must sm_ino, sm_gen, or sm_agno be zero? (I think so) "All other fields must be zero?" (Actually,this seems a bit inconsistent in the kernel; some accept only the proper subset of the above 3, others don't seem to care at all?) In general for each type, if there are (or should be?) restrictions on other fields that should be noted for each I think, or have a blanket statement that says any fields not specifically referenced under a scrub type /must/ be zero. > + > +.PD 0 > +.PP > +.nf > +.B XFS_SCRUB_TYPE_SB > +.B XFS_SCRUB_TYPE_AGF > +.B XFS_SCRUB_TYPE_AGFL > +.fi > +.TP > +.B XFS_SCRUB_TYPE_AGI > +Examine a given allocation group's superblock, free space header, free > +block list, or inode header, respectively. > +Headers are checked for obviously incorrect values and cross-referenced > +against the allocation group's metadata btrees, if possible. > +The allocation group number must be given in > +.BR sm_agno "." > + > +.PP > +.nf > +.B XFS_SCRUB_TYPE_BNOBT > +.B XFS_SCRUB_TYPE_CNTBT > +.B XFS_SCRUB_TYPE_INOBT > +.B XFS_SCRUB_TYPE_FINOBT > +.B XFS_SCRUB_TYPE_RMAPBT > +.fi > +.TP > +.B XFS_SCRUB_TYPE_REFCNTBT > +Examine a given allocation group's free space btrees, inode btress, reverse inode btrees > +mapping btrees, or reference count btrees, respectively. You list 6 types but only 4 descriptions, "respectively" > +against the allocation group's metadata btrees, if possible. sentence fragment; comma after "respectively?" > +Space extent records are checked for obviously incorrect values and > +cross-referenced with the other space extent metadata to ensure that > +there are no conflicts. > +The allocation group number must be given in > +.BR sm_agno "." > + > +.TP > +.B XFS_SCRUB_TYPE_INODE > +Examine a given inode's inode record for obviously incorrect values and inode's inode record? maybe just "examine a given inode for obviously ...?" > +discrepancies with the rest of filesystem metadata. > +Parent pointers are checked for impossible inode values and are then > +followed up to the parent directory to ensure that the linkage makes > +sense. is correct. > +The inode to examine can be specified either through s/can be/may be/ manpage-wide (seems more man-style? take it or leave it ...) > +.B sm_ino > +and > +.BR sm_gen "; " > +if not specified, then the file described by > +.B dest_fd > +will be examined. > + > +.PP > +.nf > +.B XFS_SCRUB_TYPE_BMBTD > +.B XFS_SCRUB_TYPE_BMBTA > +.fi > +.TP > +.B XFS_SCRUB_TYPE_BMBTC why are these set apart with .nf/.fi? > +Examine a given inode's data block map, extended attribute block map, > +copy on write block map, or parent inode pointer. 4 descriptions given for only 3 types. > +Inode records are examined for obviously incorrect values and > +discrepancies with the three block map types. > +The block maps are checked for obviously wrong values and > +cross-referenced with the allocation group space extent metadata for > +discrepancies. > +The inode to examine can be specified in the same manner as > +.BR XFS_SCRUB_TYPE_INODE "." > + > +.TP > +.B XFS_SCRUB_TYPE_XATTR > +Examine the extended attribute records and indices of a given inode for > +incorrect pointers and other signs of damage. > +The inode to examine can be specified in the same manner as > +.BR XFS_SCRUB_TYPE_INODE "." > + > +.TP > +.B XFS_SCRUB_TYPE_DIR > +Examine the entries in a given directory for invalid data or dangling pointers. > +The directory to examine can be specified in the same manner as > +.BR XFS_SCRUB_TYPE_INODE "." > + > +.TP > +.B XFS_SCRUB_TYPE_SYMLINK > +Examine the target of a symbolic link for obvious pathname problems. > +The link to examine can be specified in the same manner as > +.BR XFS_SCRUB_TYPE_INODE "." > + > +.PP > +.nf > +.B XFS_SCRUB_TYPE_RTBITMAP > +.fi > +.TP > +.B XFS_SCRUB_TYPE_RTSUM > +Examine the realtime block bitmap and realtime summary inodes for > +corruption. > + > +.PP > +.nf > +.B XFS_SCRUB_TYPE_UQUOTA > +.B XFS_SCRUB_TYPE_GQUOTA > +.fi > +.TP > +.B XFS_SCRUB_TYPE_PQUOTA > +Examine all user, group, or project quota records for corruption. > +.RE > + > +.PD 1 > +.PP > +The field > +.I sm_flags > +control the behavior of the scrub operation and provide more information > +about the outcome of the operation. is it worth highlighting IFLAG vs. OFLAG? (Presumbably OFLAGs must not be set going in?) > +If none of the > +.B XFS_SCRUB_OFLAG_* > +flags are set upon return, the metadata is clean. > +.RS 0.4i > +.TP > +.B XFS_SCRUB_IFLAG_REPAIR > +If the caller sets this flag, the checker will examine the metadata and > +try to fix any problems or suboptimal metadata that it finds. This reader is left wondering what constitutes "suboptimal?" > +If no errors occur during the repair operation, the check is performed a > +second time to determine if the repair succeeded. whether the repair > +If errors do occur, the call returns an error status immediately. > +.TP > +.B XFS_SCRUB_OFLAG_CORRUPT > +The metadata was corrupt when the call returned. > +If > +.B XFS_SCRUB_IFLAG_REPAIR > +was specified, then an attempted repair failed to fix the problem. > +Unmount the filesystem and run > +.B xfs_repair > +to fix the filesystem. > +.TP > +.B XFS_SCRUB_OFLAG_PREEN > +The metadata is ok, but some aspect of the metadata could be optimized > +to increase performance. call again with "XFS_SCRUB_IFLAG_REPAIR" to optimize? > +.TP > +.B XFS_SCRUB_OFLAG_XFAIL > +Filesystem errors were encountered when accessing other metadata to > +cross-reference the records attached to this metadata object. > +.TP > +.B XFS_SCRUB_OFLAG_XCORRUPT > +Discrepancies were found when cross-referencing the records attached to > +this metadata object against all other available metadata in the system. > +.TP > +.B XFS_SCRUB_OFLAG_INCOMPLETE > +The checker was unable to complete its check of all records. > +.TP > +.B XFS_SCRUB_OFLAG_WARNING > +The checker encountered a metadata object with potentially problematic > +records. > +However, the records were not obviously corrupt. > +.RE > +.PP > +For metadata checkers that operate on inodes or inode metadata, the fields > +.IR sm_ino " and " sm_gen > +are the inode number and generation number of the inode to check. and where does one get the generation number? What happens if it doesn't match? What does it mean if it doesn't? This is an operational detail that may confuse many users I think. > +If the inode number is zero, the inode represented by > +.I dest_fd > +is used instead. > +.PP > +For metadata checkers that operate on allocation group metadata, the field > +.I sm_agno > +indicates the allocation group in which to find the metadata. > +.PP > +For metadata checkers that operate on filesystem-wide metadata, no > +further arguments are required. are they even allowed? > +.SH RETURN VALUE > +On error, \-1 is returned, and > +.I errno > +is set to indicate the error. > +.PP > +.SH ERRORS > +Error codes can be one of, but are not limited to, the following: > +.TP > +.B EBUSY > +The filesystem object is busy; the repair will have to be tried agai (is this only an IFLAG_REPAIR error?) > +.TP > +.B EFSCORRUPTED > +Severe filesystem corruption was detected and could not be repaired. > +Unmount the filesystem and run > +.B xfs_repair > +to fix the filesystem. Is it always severe? Does this only happen in response to IFLAG_REPAIR? (same goes for other return values, I think many are only in response to a repair request? Might be worth noting) > +.TP > +.B EINVAL > +One or more of the arguments specified is invalid. > +.TP > +.B ENOENT > +The specified metadata object does not exist. > +For example, this error code is returned for a > +.B XFS_SCRUB_TYPE_REFCNTBT > +request on a filesystem that does not support reflink. > +.TP > +.B ENOMEM > +There was not sufficient memory to perform the scrub or repair operation. > +Some operations (most notably reference count checking) require a lot of > +memory. may require large amounts of memory > +.TP > +.B ENOSPC > +There is not enough free disk space to attempt a repair. > +.TP > +.B ENOTRECOVERABLE > +Filesystem was mounted in > +.B norecovery > +mode and therefore has an unclean log. so it can't be checked, or can't be repaired? Is this returned for any ioctl on a norecovery mount, or only repair requests? > +.TP > +.B ENOTTY > +Online scrubbing or repair were not enabled. > +.TP > +.B EOPNOTSUPP > +Repairs of the requested metadata object are not supported. > +.TP > +.B EROFS > +Filesystem is read-only and a repair was requested. > +.TP > +.B ESHUTDOWN; lose ";" > +Filesystem is shut down due to previous errors. > +.SH CONFORMING TO > +This API is specific to XFS on Linux. to the XFS filesystem on the Linux kernel. > +.SH NOTES > +These operations may tie up the filesystem for a long time. again, colloquial? > +A calling process can be stop the operation by being sent a fatal word salad :D > +signal, but non-fatal signals are blocked. > +.SH SEE ALSO > +.BR ioctl (2) xfs_repair (8) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html