[PATCH 01/29] xfs_scrub: create online filesystem scrub program

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

Create the foundations of a filesystem scrubbing tool that asks the
kernel to inspect all metadata in the filesystem and (ultimately) to
repair anything that's broken.  Also create the man page for the
utility.

Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
---
 .gitignore                   |    1 
 Makefile                     |    3 +
 man/man8/xfs_scrub.8         |  135 ++++++++++++++++++++++++++++++++++++++++++
 scrub/Makefile               |   42 +++++++++++++
 scrub/common.c               |   20 ++++++
 scrub/common.h               |   23 +++++++
 scrub/xfs_scrub.c            |  109 ++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.h            |   23 +++++++
 tools/find-api-violations.sh |    2 -
 9 files changed, 356 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/xfs_scrub.8
 create mode 100644 scrub/Makefile
 create mode 100644 scrub/common.c
 create mode 100644 scrub/common.h
 create mode 100644 scrub/xfs_scrub.c
 create mode 100644 scrub/xfs_scrub.h


diff --git a/.gitignore b/.gitignore
index e839e2a..a3db640 100644
--- a/.gitignore
+++ b/.gitignore
@@ -68,6 +68,7 @@ cscope.*
 /repair/xfs_repair
 /rtcp/xfs_rtcp
 /spaceman/xfs_spaceman
+/scrub/xfs_scrub
 
 # generated crc files
 /libxfs/crc32selftest
diff --git a/Makefile b/Makefile
index 0dce80a..3bd0796 100644
--- a/Makefile
+++ b/Makefile
@@ -48,7 +48,7 @@ LIBFROG_SUBDIR = libfrog
 DLIB_SUBDIRS = libxlog libxcmd libhandle
 LIB_SUBDIRS = libxfs $(DLIB_SUBDIRS)
 TOOL_SUBDIRS = copy db estimate fsck growfs io logprint mkfs quota \
-		mdrestore repair rtcp m4 man doc debian spaceman
+		mdrestore repair rtcp m4 man doc debian spaceman scrub
 
 ifneq ("$(PKG_PLATFORM)","darwin")
 TOOL_SUBDIRS += fsr
@@ -91,6 +91,7 @@ repair: libxlog libxcmd
 copy: libxlog
 mkfs: libxcmd
 spaceman: libxcmd
+scrub: libhandle libxcmd
 
 ifeq ($(HAVE_BUILDDEFS), yes)
 include $(BUILDRULES)
diff --git a/man/man8/xfs_scrub.8 b/man/man8/xfs_scrub.8
new file mode 100644
index 0000000..c9df7d6
--- /dev/null
+++ b/man/man8/xfs_scrub.8
@@ -0,0 +1,135 @@
+.TH xfs_scrub 8
+.SH NAME
+xfs_scrub \- scrub the contents of an XFS filesystem
+.SH SYNOPSIS
+.B xfs_scrub
+[
+.B \-abemnTvxy
+]
+.RI "[" mount-point " | " block-device "]"
+.br
+.B xfs_scrub \-V
+.SH DESCRIPTION
+.B xfs_scrub
+attempts to check and repair all metadata in a mounted XFS filesystem.
+.PP
+.B xfs_scrub
+asks the kernel to scrub all metadata objects in the filesystem.
+Metadata records are scanned for obviously bad values and then
+cross-referenced against other metadata.
+The goal is to establish a reasonable confidence about the consistency
+of the overall filesystem by examining the consistency of individual
+metadata records against the other metadata in the filesystem.
+Damaged metadata can be rebuilt from other metadata if there exists
+redundant data structures which are intact.
+.PP
+Filesystem corruption and optimization opportunities will be logged to
+the standard error stream.
+Enabling verbose mode will increase the amount of status information
+sent to the output.
+.PP
+This utility does not know how to correct all errors.
+If the tool cannot fix the detected errors, you must unmount the
+filesystem and run
+.BR xfs_repair (8)
+to fix the problems.
+If this tool is not run with either of the
+.B \-n
+or
+.B \-y
+options, then it will optimize the filesystem when possible,
+but it will not try to fix errors.
+See the optimizations section below for a list of optimizations
+supported by this program.
+.SH OPTIONS
+.TP
+.BI \-a " errors"
+Abort if more than this many errors are found on the filesystem.
+.TP
+.B \-b
+Run in background mode.
+If the option is specified once, only run a single scrubbing thread at a
+time.
+If given more than once, an artificial delay of 100us is added to each
+scrub call to reduce CPU overhead even further.
+.TP
+.B \-e
+Specifies what happens when errors are detected.
+If
+.IR shutdown
+is given, the filesystem will be taken offline if errors are found.
+If
+.IR continue
+is given, no action is taken if errors are found; this is the default
+behavior.
+.TP
+.BI \-m " file"
+Search this file for mounted filesystems instead of /etc/mtab.
+.TP
+.B \-n
+Dry run, do not modify anything in the filesystem.
+This disables all optimization and repair behaviors.
+.TP
+.BI \-T
+Print timing and memory usage information for each phase.
+.TP
+.B \-v
+Enable verbose mode, which prints periodic status updates.
+.TP
+.B \-V
+Prints the version number and exits.
+.TP
+.B \-x
+Read all file data extents to look for disk errors.
+.B xfs_scrub
+will issue O_DIRECT reads to the block device directly.
+If the block device is a SCSI disk, it will instead issue READ VERIFY commands
+directly to the disk.
+These actions will confirm that all file data blocks can be read from storage.
+.TP
+.B \-y
+Try to repair all filesystem errors.
+If the errors cannot be fixed online, then the filesystem must be taken
+offline for repair.
+.SH OPTIMIZATIONS
+Optimizations supported by this program include:
+.IP \[bu] 2
+Updating secondary superblocks to match the primary superblock.
+.IP \[bu]
+Turning off shared block write checks for files that no longer share blocks.
+.IP \[bu]
+Instructing the underlying storage to discard unused extents via the
+.B FITRIM
+ioctl.
+.SH REPAIRS
+This program currently does not support making any repairs.
+Corruptions can only be fixed by unmounting the filesystem and running
+.BR xfs_repair (8).
+.SH EXIT CODE
+The exit code returned by
+.B xfs_scrub
+is the sum of the following conditions:
+.br
+\	0\	\-\ No errors
+.br
+\	1\	\-\ File system errors left uncorrected
+.br
+\	2\	\-\ File system optimizations possible
+.br
+\	4\	\-\ Operational error
+.br
+\	8\	\-\ Usage or syntax error
+.br
+.SH CAVEATS
+.B xfs_scrub
+is an immature utility!
+Do not run this program unless you have backups of your data!
+This program takes advantage of in-kernel scrubbing to verify a given
+data structure with locks held and can keep the filesystem busy for a
+long time.
+The kernel must be new enough to support the SCRUB_METADATA ioctl.
+.PP
+If errors are found and cannot be repaired, the filesystem must be
+unmounted and repaired.
+.SH SEE ALSO
+.BR xfs_repair (8).
diff --git a/scrub/Makefile b/scrub/Makefile
new file mode 100644
index 0000000..62cca3b
--- /dev/null
+++ b/scrub/Makefile
@@ -0,0 +1,42 @@
+#
+# Copyright (C) 2018 Oracle.  All Rights Reserved.
+#
+
+TOPDIR = ..
+include $(TOPDIR)/include/builddefs
+
+# On linux we get fsmap from the system or define it ourselves
+# so include this based on platform type.  If this reverts to only
+# the autoconf check w/o local definition, change to testing HAVE_GETFSMAP
+SCRUB_PREREQS=$(PKG_PLATFORM)
+
+ifeq ($(SCRUB_PREREQS),linux)
+LTCOMMAND = xfs_scrub
+INSTALL_SCRUB = install-scrub
+endif	# scrub_prereqs
+
+HFILES = \
+common.h \
+xfs_scrub.h
+
+CFILES = \
+common.c \
+xfs_scrub.c
+
+LLDLIBS += $(LIBHANDLE) $(LIBFROG) $(LIBPTHREAD)
+LTDEPENDENCIES += $(LIBHANDLE) $(LIBFROG)
+LLDFLAGS = -static
+
+default: depend $(LTCOMMAND)
+
+include $(BUILDRULES)
+
+install: default $(INSTALL_SCRUB)
+
+install-scrub:
+	$(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR)
+	$(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR)
+
+install-dev:
+
+-include .dep
diff --git a/scrub/common.c b/scrub/common.c
new file mode 100644
index 0000000..0a58c16
--- /dev/null
+++ b/scrub/common.c
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "common.h"
diff --git a/scrub/common.h b/scrub/common.h
new file mode 100644
index 0000000..1082296
--- /dev/null
+++ b/scrub/common.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_COMMON_H_
+#define XFS_SCRUB_COMMON_H_
+
+#endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
new file mode 100644
index 0000000..a4b9c71
--- /dev/null
+++ b/scrub/xfs_scrub.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include <stdio.h>
+#include "xfs_scrub.h"
+
+/*
+ * XFS Online Metadata Scrub (and Repair)
+ *
+ * The XFS scrubber uses custom XFS ioctls to probe more deeply into the
+ * internals of the filesystem.  It takes advantage of scrubbing ioctls
+ * to check all the records stored in a metadata object and to
+ * cross-reference those records against the other filesystem metadata.
+ *
+ * After the program gathers command line arguments to figure out
+ * exactly what the program is going to do, scrub execution is split up
+ * into several separate phases:
+ *
+ * The "find geometry" phase queries XFS for the filesystem geometry.
+ * The block devices for the data, realtime, and log devices are opened.
+ * Kernel ioctls are test-queried to see if they actually work (the scrub
+ * ioctl in particular), and any other filesystem-specific information
+ * is gathered.
+ *
+ * In the "check internal metadata" phase, we call the metadata scrub
+ * ioctl to check the filesystem's internal per-AG btrees.  This
+ * includes the AG superblock, AGF, AGFL, and AGI headers, freespace
+ * btrees, the regular and free inode btrees, the reverse mapping
+ * btrees, and the reference counting btrees.  If the realtime device is
+ * enabled, the realtime bitmap and reverse mapping btrees are checked.
+ * Quotas, if enabled, are also checked in this phase.
+ *
+ * Each AG (and the realtime device) has its metadata checked in a
+ * separate thread for better performance.  Errors in the internal
+ * metadata can be fixed here prior to the inode scan; refer to the
+ * section about the "repair filesystem" phase for more information.
+ *
+ * The "scan all inodes" phase uses BULKSTAT to scan all the inodes in
+ * an AG in disk order.  The BULKSTAT information provides enough
+ * information to construct a file handle that is used to check the
+ * following parts of every file:
+ *
+ *  - The inode record
+ *  - All three block forks (data, attr, CoW)
+ *  - If it's a symlink, the symlink target.
+ *  - If it's a directory, the directory entries.
+ *  - All extended attributes
+ *  - The parent pointer
+ *
+ * Multiple threads are started to check each the inodes of each AG in
+ * parallel.  Errors in file metadata can be fixed here; see the section
+ * about the "repair filesystem" phase for more information.
+ *
+ * Next comes the (configurable) "repair filesystem" phase.  The user
+ * can instruct this program to fix all problems encountered; to fix
+ * only optimality problems and leave the corruptions; or not to touch
+ * the filesystem at all.  Any metadata repairs that did not succeed in
+ * the previous two phases are retried here; if there are uncorrectable
+ * errors, xfs_scrub stops here.
+ *
+ * The next phase is the "check directory tree" phase.  In this phase,
+ * every directory is opened (via file handle) to confirm that each
+ * directory is connected to the root.  Directory entries are checked
+ * for ambiguous Unicode normalization mappings, which is to say that we
+ * look for pairs of entries whose utf-8 strings normalize to the same
+ * code point sequence and map to different inodes, because that could
+ * be used to trick a user into opening the wrong file.  The names of
+ * extended attributes are checked for Unicode normalization collisions.
+ *
+ * In the "verify data file integrity" phase, we employ GETFSMAP to read
+ * the reverse-mappings of all AGs and issue direct-reads of the
+ * underlying disk blocks.  We rely on the underlying storage to have
+ * checksummed the data blocks appropriately.  Multiple threads are
+ * started to check each AG in parallel; a separate thread pool is used
+ * to handle the direct reads.
+ *
+ * In the "check summary counters" phase, use GETFSMAP to tally up the
+ * blocks and BULKSTAT to tally up the inodes we saw and compare that to
+ * the statfs output.  This gives the user a rough estimate of how
+ * thorough the scrub was.
+ */
+
+/* Program name; needed for libfrog error reports. */
+char				*progname = "xfs_scrub";
+
+int
+main(
+	int			argc,
+	char			**argv)
+{
+	fprintf(stdout, "EXPERIMENTAL xfs_scrub program in use! Use at your own risk!\n");
+	return 4;
+}
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
new file mode 100644
index 0000000..ff9c24d
--- /dev/null
+++ b/scrub/xfs_scrub.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2018 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#ifndef XFS_SCRUB_XFS_SCRUB_H_
+#define XFS_SCRUB_XFS_SCRUB_H_
+
+#endif /* XFS_SCRUB_XFS_SCRUB_H_ */
diff --git a/tools/find-api-violations.sh b/tools/find-api-violations.sh
index 3b976d3..cb075ba 100755
--- a/tools/find-api-violations.sh
+++ b/tools/find-api-violations.sh
@@ -6,7 +6,7 @@
 
 # NOTE: This script doesn't look for API violations in function parameters.
 
-tool_dirs="copy db estimate fsck fsr growfs io logprint mdrestore mkfs quota repair rtcp"
+tool_dirs="copy db estimate fsck fsr growfs io logprint mdrestore mkfs quota repair rtcp scrub"
 
 # Calls to xfs_* functions in libxfs/*.c without the libxfs_ prefix
 find_possible_api_calls() {

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux