Re: [PATCH 04/14] xfs: document the user interface for online fsck

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 15 Aug 2022 19:30:15 -0700

On Thu, Aug 11, 2022 at 10:20:12AM +1000, Dave Chinner wrote:
> On Sun, Aug 07, 2022 at 11:30:28AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > Start the fourth chapter of the online fsck design documentation, which
> > discusses the user interface and the background scrubbing service.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> >  .../filesystems/xfs-online-fsck-design.rst         |  114 ++++++++++++++++++++
> >  1 file changed, 114 insertions(+)
> > 
> > 
> > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst
> > index d630b6bdbe4a..42e82971e036 100644
> > --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> > @@ -750,3 +750,117 @@ Proposed patchsets include `general stress testing
> >  <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
> >  and the `evolution of existing per-function stress testing
> >  <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
> > +
> > +4. User Interface
> > +=================
> > +
> > +The primary user of online fsck is the system administrator, just like offline
> > +repair.
> > +Online fsck presents two modes of operation to administrators:
> > +A foreground CLI process for online fsck on demand, and a background service
> > +that performs autonomous checking and repair.
> > +
> > +Checking on Demand
> > +------------------
> > +
> > +For administrators who want the absolute freshest information about the
> > +metadata in a filesystem, ``xfs_scrub`` can be run as a foreground process on
> > +a command line.
> > +The program checks every piece of metadata in the filesystem while the
> > +administrator waits for the results to be reported, just like the existing
> > +``xfs_repair`` tool.
> > +Both tools share a ``-n`` option to perform a read-only scan, and a ``-v``
> > +option to increase the verbosity of the information reported.
> > +
> > +A new feature of ``xfs_scrub`` is the ``-x`` option, which employs the error
> > +correction capabilities of the hardware to check data file contents.
> > +The media scan is not enabled by default because it may dramatically increase
> > +program runtime and consume a lot of bandwidth on older storage hardware.
> 
> So '-x' runs a media scrub command? What does that do with software
> RAID?

Nothing special unless the RAID controller itself does parity checking
of reads -- the kernel doesn't have any API calls (that I know of) to do
that.  I think md-raid5 will check the parity, but afaict nothing else
(raid1) does that.

> Does that trigger parity checks of the RAID volume, or pass
> through to the underlying hardware to do physical media scrub?

Chaitanya proposed a userspace api so that xfs_scrub could actually ask
the hardware to perform a media verification[1], but willy pointed out
that it none of the device protocols have a means for the device to
prove that it did anything, so it stalled.

[1] https://lore.kernel.org/linux-fsdevel/20220713072019.5885-1-kch@xxxxxxxxxx/

> Or maybe both?

I wish. :)

> Rewriting the paragraph to be focussed around the functionality
> being provided (i.e "media scrubbing is a new feature of xfs_scrub.
> It provides .....")

Er.. you're doing that, or asking me to do it?

> > +The output of a foreground invocation is captured in the system log.
> 
> At what log level?

That depends on the message, but right now it only uses
LOG_{ERR,WARNING,INFO}.

Errors, corruptions, and unfixable problems are LOG_ERR.

Warnings are LOG_WARNING.

Notices of infomration, repairs completed, and optimizations made are
all recorded with LOG_INFO.

> > +The ``xfs_scrub_all`` program walks the list of mounted filesystems and
> > +initiates ``xfs_scrub`` for each of them in parallel.
> > +It serializes scans for any filesystems that resolve to the same top level
> > +kernel block device to prevent resource overconsumption.
> 
> Is this serialisation necessary for non-HDD devices?

That ultimately depends on the preferences of the sysadmins, but for the
initial push I'd rather err on the side of using fewer iops on a running
system.

> > +Background Service
> > +------------------
> > +
> > +To reduce the workload of system administrators, the ``xfs_scrub`` package
> > +provides a suite of `systemd <https://systemd.io/>`_ timers and services that
> > +run online fsck automatically on weekends.
> 
> Weekends change depending on where you are in the world, right? So
> maybe this should be more explicit?

Sunday at 3:10am, whenever that is in the local time zone.

> [....]
> 
> > +**Question**: Should the health reporting integrate with the new inotify fs
> > +error notification system?
> 
> Can the new inotify fs error notification system report complex
> health information structures?

In theory, yes, said the authors.

> How much pain is involved in making
> it do what we want, considering we already have a health reporting
> ioctl that can be polled?

I haven't tried this myself, but I think it involves defining a new type
code and message length within the inotify system.  The last time I
looked at the netlink protocol, I /think/ I saw that it's the case that
the consuming programs will read the header, see that there's a type
code and a buffer length, and decide to use it or skip it.

That said, there were some size and GFP_ limits on what could be sent,
so I don't know how difficult it would be to make this part actually
work in practice.  Gabriel said it wouldn't be too difficult once I was
ready.

> > +**Question**: Would it be helpful for sysadmins to have a daemon to listen for
> > +corruption notifications and initiate a repair?
> 
> Seems like an obvious extension to the online repair capability.

...too bad there are dragons thataways.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx