On Fri, Nov 01, 2019 at 04:32:28PM -0500, Eric Sandeen wrote: > On 10/22/19 1:49 PM, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > There's nothing that xfs_scrub (or XFS) can do about media errors for > > data file blocks -- the data are gone. Create a new category for these > > unfixable errors so that we don't advise the user to take further action > > that won't fix the problem. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > are "unfixable errors" /exclusively/ file data media errors? The intent behind this new classification is for things that we can't ever fix or rebuild in the filesystem. However, at least for now, file data loss is the only source of unfixable errors. > > diff --git a/scrub/phase4.c b/scrub/phase4.c > > index 1cf3f6b7..a276bc32 100644 > > --- a/scrub/phase4.c > > +++ b/scrub/phase4.c > > @@ -99,7 +99,10 @@ xfs_process_action_items( > > workqueue_destroy(&wq); > > > > pthread_mutex_lock(&ctx->lock); > > - if (moveon && ctx->corruptions_found == 0 && want_fstrim) { > > + if (moveon && > > + ctx->corruptions_found == 0 && > > + ctx->unfixable_errors == 0 && > > + want_fstrim) { > > fstrim(ctx); > > progress_add(1); > > } > > > why would a file data media error preclude fstrim? If there's even the slightest hint of things still being wrong with the filesystem or the underlying storage, we should avoid writing to the storage (e.g. FITRIM) because that could screw things up even more. Note that this fstrim happens at the end of phase 4, so that means that if we have any corruptions or unfixable errors at this point, they're in the metadata, our attempts to fix them have failed, and so we absolutely should not go writing more things to the disk. > > diff --git a/scrub/phase5.c b/scrub/phase5.c > > index dc0ee5e8..e0c4a3df 100644 > > --- a/scrub/phase5.c > > +++ b/scrub/phase5.c > > @@ -336,7 +336,7 @@ xfs_scan_connections( > > bool moveon = true; > > bool ret; > > > > - if (ctx->corruptions_found) { > > + if (ctx->corruptions_found || ctx->unfixable_errors) { > > str_info(ctx, ctx->mntpoint, > > _("Filesystem has errors, skipping connectivity checks.")); > > why would a file data media error stop the connectivity check? It won't, because the media scan happens during phase 6. > unless "unfixable" may be other types, in which case it makes sense. They will, presumably. For an unfixable error to stop the connectivity checks (phase 5), the unfixable error would have to be observed during phases 1-4. Those first four phases exclusively look for (and repair) internal fs metadata, so if we reach phase 5 with an unfixable error that means the fs metadata are trashed and xfs_scrub is going to fail anyway. Note that I haven't actually tried to write a directory repair function yet. The last time I asked viro about it he wondered what in the hell I was going to do about the dentry cache and doubted that it could be done... so either I have to prove him wrong, or maybe directories /will/ become the first source of unfixable errors for phase 1-4. So yes, the 'unfixable' category is supposed to be broad enough to cover more than just file data loss, though for now it's only used for that. --D > But the commit log indicates it's just for a file data media error. > > What's the intent? > > -Eric