Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed

harshad shirwadkar <harshadshirwadkar@xxxxxxxxx> · Tue, 5 Jan 2021 15:06:13 -0800

Sorry for the delay. Thanks for providing more information, Haotian.
So this is happening due to IO errors experienced due to a flaky
network connection. I can imagine that this is perhaps a situation
which is recoverable but I guess when running on physical hardware,
it's less likely for such IO errors to be recoverable. I wonder if
this means we need an e2fsck.conf option - something like
"recovery_error_behavior" with default value of "continue". For
usecases such as this, we can set it to "exit" or perhaps "retry"?

On Thu, Dec 24, 2020 at 5:49 PM Zhiqiang Liu <liuzhiqiang26@xxxxxxxxxx> wrote:
>
> friendly ping...
>
> On 2020/12/15 15:43, Haotian Li wrote:
> > Thanks for your review. I agree with you that it's more important
> > to understand the errors found by e2fsck. we'll decribe the case
> > below about this problem.
> >
> > The probelm we find actually in a remote storage case. It means
> > e2fsck's read or write may fail because of the network packet loss.
> > At first time, some packet loss errors happen during e2fsck's journal
> > recovery (using fsck -a), then recover failed. At second time, we
> > fix the network problem and run e2fsck again, but it still has errors
> > when we try to mount. Then we set jsb->s_start journal flags and retry
> > e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's
> > journal recovery, probably the bug we've described on the patch.
> >
> > Certainly, directly exit is not a good way to fix this problem.
> > just like what Harshad said, we need tell user what happen and listen
> > user's decision, continue e2fsck or not. If we want to safely use
> > e2fsck without human intervention (using fsck -a), I wonder if we need
> > provide a safe mechanism to complate the fast check but avoid changes
> > on journal or something else which may be fixed in feature (such
> > as jsb->s_start flag)?
> >
> > Thanks
> > Haotian
> >
> > 在 2020/12/15 4:27, Theodore Y. Ts'o 写道:
> >> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote:
> >>> Hi Haotian,
> >>>
> >>> Yeah perhaps these are the only recoverable errors. I also think that
> >>> we can't surely say that these errors are recoverable always. That's
> >>> because in some setups, these errors may still be unrecoverable (for
> >>> example, if the machine is running under low memory). I still feel
> >>> that we should ask the user about whether they want to continue or
> >>> not. The reason is that firstly if we don't allow running e2fsck in
> >>> these cases, I wonder what would the user do with their file system -
> >>> they can't mount / can't run fsck, right? Secondly, not doing that
> >>> would be a regression. I wonder if some setups would have chosen to
> >>> ignore journal recovery if there are errors during journal recovery
> >>> and with this fix they may start seeing that their file systems aren't
> >>> getting repaired.
> >>
> >> It may very well be that there are corrupted file system structures
> >> that could lead to ENOMEM.  If so, I'd consider that someone we should
> >> be explicitly checking for in e2fsck, and it's actually relatively
> >> unlikely in the jbd2 recovery code, since that's fairly straight
> >> forward --- except I'd be concerned about potential cases in your Fast
> >> Commit code, since there's quite a bit more complexity when parsing
> >> the fast commit journal.
> >>
> >> This isn't a new concern; we've already talked a about the fact the
> >> fast commit needs to have a lot more sanity checks to look for
> >> maliciously --- or syzbot generated, which may be the same thing :-)
> >> --- inconsistent fields causing the e2fsck reply code to behave in
> >> unexpected way, which might include trying to allocate insane amounts
> >> of memory, array buffer overruns, etc.
> >>
> >> But assuming that ENOMEM is always due to operational concerns, as
> >> opposed to file system corruption, may not always be a safe
> >> assumption.
> >>
> >> Something else to consider is from the perspective of a naive system
> >> administrator, if there is an bad media sector in the journal, simply
> >> always aborting the e2fsck run may not allow them an easy way to
> >> recover.  Simply ignoring the journal and allowing the next write to
> >> occur, at which point the HDD or SSD will redirect the write to a bad
> >> sector spare spool, will allow for an automatic recovery.  Simply
> >> always causing e2fsck to fail, would actually result in a worse
> >> outcome in this particular case.
> >>
> >> (This is especially true for a mobile device, where the owner is not
> >> likely to have access to the serial console to manually run e2fsck,
> >> and where if they can't automatically recover, they will have to take
> >> their phone to the local cell phone carrier store for repairs ---
> >> which is *not* something that a cellular provider will enjoy, and they
> >> will tend to choose other cell phone models to feature as
> >> supported/featured devices.  So an increased number of failures which
> >> cann't be automatically recovered cause the carrier to choose to
> >> feature, say, a Xiaomi phone over a ZTE phone.)
> >>
> >>> I'm wondering if you saw any a situation in your setup where exiting
> >>> e2fsck helped? If possible, could you share what kind of errors were
> >>> seen in journal recovery and what was the expected behavior? Maybe
> >>> that would help us decide on the right behavior.
> >>
> >> Seconded; I think we should try to understand why it is that e2fsck is
> >> failing with these sorts of errors.  It may be that there are better
> >> ways of solving the high-level problem.
> >>
> >> For example, the new libext2fs bitmap backends were something that I
> >> added because when running a large number of e2fsck processes in
> >> parallel on a server machine with dozens of HDD spindles was causing
> >> e2fsck processes to run slowly due to memory contention.  We fixed it
> >> by making e2fsck more memory efficient, by improving the bitmap
> >> implementations --- but if that hadn't been sufficient, I had also
> >> considered adding support to make /sbin/fsck "smarter" by limiting the
> >> number of fsck.XXX processes that would get started simultaneously,
> >> since that could actually cause the file system check to run faster by
> >> reducing memory thrashing.  (The trick would have been how to make
> >> fsck smart enough to automatically tune the number of parallel fsck
> >> processes to allow, since asking the system administrator to manually
> >> tune the max number of processes would be annoying to the sysadmin,
> >> and would mean that the feature would never get used outside of $WORK
> >> in practice.)
> >>
> >> So is the actual underlying problem that e2fsck is running out of
> >> memory?  If so, is it because there simply isn't enough physical
> >> memory available?  Is it being run in a cgroup container which is too
> >> small?  Or is it because too many file systems are being checked in
> >> parallel at the same time?
> >>
> >> Or is it I/O errors that you are concerned with?  And how do you know
> >> that they are not permanent errors; is thie caused by something like
> >> fibre channel connections being flaky?
> >>
> >> Or is this a hypotethical worry, as opposed to something which is
> >> causing operational problems right now?
> >>
> >> Cheers,
> >>
> >>                                      - Ted
> >>
> >> .
> >>
> >
> > .
> >
>