Re: [PATCH 7/7] xfs_scrub: create a new category for unfixable errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 01, 2019 at 04:32:28PM -0500, Eric Sandeen wrote:
> On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > There's nothing that xfs_scrub (or XFS) can do about media errors for
> > data file blocks -- the data are gone.  Create a new category for these
> > unfixable errors so that we don't advise the user to take further action
> > that won't fix the problem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> 
> are "unfixable errors" /exclusively/ file data media errors?

The intent behind this new classification is for things that we can't
ever fix or rebuild in the filesystem.  However, at least for now, file
data loss is the only source of unfixable errors.

> > diff --git a/scrub/phase4.c b/scrub/phase4.c
> > index 1cf3f6b7..a276bc32 100644
> > --- a/scrub/phase4.c
> > +++ b/scrub/phase4.c
> > @@ -99,7 +99,10 @@ xfs_process_action_items(
> >  	workqueue_destroy(&wq);
> >  
> >  	pthread_mutex_lock(&ctx->lock);
> > -	if (moveon && ctx->corruptions_found == 0 && want_fstrim) {
> > +	if (moveon &&
> > +	    ctx->corruptions_found == 0 &&
> > +	    ctx->unfixable_errors == 0 &&
> > +	    want_fstrim) {
> >  		fstrim(ctx);
> >  		progress_add(1);
> >  	}
> 
> 
> why would a file data media error preclude fstrim?

If there's even the slightest hint of things still being wrong with the
filesystem or the underlying storage, we should avoid writing to the
storage (e.g. FITRIM) because that could screw things up even more.

Note that this fstrim happens at the end of phase 4, so that means that
if we have any corruptions or unfixable errors at this point, they're in
the metadata, our attempts to fix them have failed, and so we absolutely
should not go writing more things to the disk.

> > diff --git a/scrub/phase5.c b/scrub/phase5.c
> > index dc0ee5e8..e0c4a3df 100644
> > --- a/scrub/phase5.c
> > +++ b/scrub/phase5.c
> > @@ -336,7 +336,7 @@ xfs_scan_connections(
> >  	bool			moveon = true;
> >  	bool			ret;
> >  
> > -	if (ctx->corruptions_found) {
> > +	if (ctx->corruptions_found || ctx->unfixable_errors) {
> >  		str_info(ctx, ctx->mntpoint,
> >  _("Filesystem has errors, skipping connectivity checks."));
> 
> why would a file data media error stop the connectivity check?

It won't, because the media scan happens during phase 6.

> unless "unfixable" may be other types, in which case it makes sense.

They will, presumably.  For an unfixable error to stop the connectivity
checks (phase 5), the unfixable error would have to be observed during
phases 1-4.  Those first four phases exclusively look for (and repair)
internal fs metadata, so if we reach phase 5 with an unfixable error
that means the fs metadata are trashed and xfs_scrub is going to fail
anyway.

Note that I haven't actually tried to write a directory repair function
yet.  The last time I asked viro about it he wondered what in the hell I
was going to do about the dentry cache and doubted that it could be
done... so either I have to prove him wrong, or maybe directories /will/
become the first source of unfixable errors for phase 1-4.

So yes, the 'unfixable' category is supposed to be broad enough to cover
more than just file data loss, though for now it's only used for that.

--D

> But the commit log indicates it's just for a file data media error.
> 
> What's the intent?
>
> -Eric



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux