Re: [PATCH 03/14] xfs: document the testing plan for online fsck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 11, 2022 at 10:09:45AM +1000, Dave Chinner wrote:
> On Sun, Aug 07, 2022 at 11:30:22AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > Start the third chapter of the online fsck design documentation.  This
> > covers the testing plan to make sure that both online and offline fsck
> > can detect arbitrary problems and correct them without making things
> > worse.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> >  .../filesystems/xfs-online-fsck-design.rst         |  187 ++++++++++++++++++++
> >  1 file changed, 187 insertions(+)
> 
> 
> ....
> > +Stress Testing
> > +--------------
> > +
> > +A unique requirement to online fsck is the ability to operate on a filesystem
> > +concurrently with regular workloads.
> > +Although it is of course impossible to run ``xfs_scrub`` with *zero* observable
> > +impact on the running system, the online repair code should never introduce
> > +inconsistencies into the filesystem metadata, and regular workloads should
> > +never notice resource starvation.
> > +To verify that these conditions are being met, fstests has been enhanced in
> > +the following ways:
> > +
> > +* For each scrub item type, create a test to exercise checking that item type
> > +  while running ``fsstress``.
> > +* For each scrub item type, create a test to exercise repairing that item type
> > +  while running ``fsstress``.
> > +* Race ``fsstress`` and ``xfs_scrub -n`` to ensure that checking the whole
> > +  filesystem doesn't cause problems.
> > +* Race ``fsstress`` and ``xfs_scrub`` in force-rebuild mode to ensure that
> > +  force-repairing the whole filesystem doesn't cause problems.
> > +* Race ``xfs_scrub`` in check and force-repair mode against ``fsstress`` while
> > +  freezing and thawing the filesystem.
> > +* Race ``xfs_scrub`` in check and force-repair mode against ``fsstress`` while
> > +  remounting the filesystem read-only and read-write.
> > +* The same, but running ``fsx`` instead of ``fsstress``.  (Not done yet?)
> 
> I had a thought when reading this that we want to ensure that online
> repair handles concurrent grow/shrink operations so that doesn't
> cause problems, as well as dealing with concurrent attempts to run
> independent online repair processes.
> 
> Not sure that comes under stress testing, but it was the "test while
> freeze/thaw" that triggered me to think of this, so that's where I'm
> commenting about it. :)

Hmm.  I hadn't really given that much thought.  Let me go add that to
the test suite and see how many daemons come pouring out...

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux