Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal_recover() failed

Zhiqiang Liu <liuzhiqiang26@xxxxxxxxxx> · Fri, 25 Dec 2020 09:49:26 +0800

friendly ping...

On 2020/12/15 15:43, Haotian Li wrote:
> Thanks for your review. I agree with you that it's more important
> to understand the errors found by e2fsck. we'll decribe the case
> below about this problem.
> 
> The probelm we find actually in a remote storage case. It means
> e2fsck's read or write may fail because of the network packet loss.
> At first time, some packet loss errors happen during e2fsck's journal
> recovery (using fsck -a), then recover failed. At second time, we
> fix the network problem and run e2fsck again, but it still has errors
> when we try to mount. Then we set jsb->s_start journal flags and retry
> e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's
> journal recovery, probably the bug we've described on the patch.
> 
> Certainly, directly exit is not a good way to fix this problem.
> just like what Harshad said, we need tell user what happen and listen
> user's decision, continue e2fsck or not. If we want to safely use
> e2fsck without human intervention (using fsck -a), I wonder if we need
> provide a safe mechanism to complate the fast check but avoid changes
> on journal or something else which may be fixed in feature (such
> as jsb->s_start flag)?
> 
> Thanks
> Haotian
> 
> 在 2020/12/15 4:27, Theodore Y. Ts'o 写道:
>> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote:
>>> Hi Haotian,
>>>
>>> Yeah perhaps these are the only recoverable errors. I also think that
>>> we can't surely say that these errors are recoverable always. That's
>>> because in some setups, these errors may still be unrecoverable (for
>>> example, if the machine is running under low memory). I still feel
>>> that we should ask the user about whether they want to continue or
>>> not. The reason is that firstly if we don't allow running e2fsck in
>>> these cases, I wonder what would the user do with their file system -
>>> they can't mount / can't run fsck, right? Secondly, not doing that
>>> would be a regression. I wonder if some setups would have chosen to
>>> ignore journal recovery if there are errors during journal recovery
>>> and with this fix they may start seeing that their file systems aren't
>>> getting repaired.
>>
>> It may very well be that there are corrupted file system structures
>> that could lead to ENOMEM.  If so, I'd consider that someone we should
>> be explicitly checking for in e2fsck, and it's actually relatively
>> unlikely in the jbd2 recovery code, since that's fairly straight
>> forward --- except I'd be concerned about potential cases in your Fast
>> Commit code, since there's quite a bit more complexity when parsing
>> the fast commit journal.
>>
>> This isn't a new concern; we've already talked a about the fact the
>> fast commit needs to have a lot more sanity checks to look for
>> maliciously --- or syzbot generated, which may be the same thing :-)
>> --- inconsistent fields causing the e2fsck reply code to behave in
>> unexpected way, which might include trying to allocate insane amounts
>> of memory, array buffer overruns, etc.
>>
>> But assuming that ENOMEM is always due to operational concerns, as
>> opposed to file system corruption, may not always be a safe
>> assumption.
>>
>> Something else to consider is from the perspective of a naive system
>> administrator, if there is an bad media sector in the journal, simply
>> always aborting the e2fsck run may not allow them an easy way to
>> recover.  Simply ignoring the journal and allowing the next write to
>> occur, at which point the HDD or SSD will redirect the write to a bad
>> sector spare spool, will allow for an automatic recovery.  Simply
>> always causing e2fsck to fail, would actually result in a worse
>> outcome in this particular case.
>>
>> (This is especially true for a mobile device, where the owner is not
>> likely to have access to the serial console to manually run e2fsck,
>> and where if they can't automatically recover, they will have to take
>> their phone to the local cell phone carrier store for repairs ---
>> which is *not* something that a cellular provider will enjoy, and they
>> will tend to choose other cell phone models to feature as
>> supported/featured devices.  So an increased number of failures which
>> cann't be automatically recovered cause the carrier to choose to
>> feature, say, a Xiaomi phone over a ZTE phone.)
>>
>>> I'm wondering if you saw any a situation in your setup where exiting
>>> e2fsck helped? If possible, could you share what kind of errors were
>>> seen in journal recovery and what was the expected behavior? Maybe
>>> that would help us decide on the right behavior.
>>
>> Seconded; I think we should try to understand why it is that e2fsck is
>> failing with these sorts of errors.  It may be that there are better
>> ways of solving the high-level problem.
>>
>> For example, the new libext2fs bitmap backends were something that I
>> added because when running a large number of e2fsck processes in
>> parallel on a server machine with dozens of HDD spindles was causing
>> e2fsck processes to run slowly due to memory contention.  We fixed it
>> by making e2fsck more memory efficient, by improving the bitmap
>> implementations --- but if that hadn't been sufficient, I had also
>> considered adding support to make /sbin/fsck "smarter" by limiting the
>> number of fsck.XXX processes that would get started simultaneously,
>> since that could actually cause the file system check to run faster by
>> reducing memory thrashing.  (The trick would have been how to make
>> fsck smart enough to automatically tune the number of parallel fsck
>> processes to allow, since asking the system administrator to manually
>> tune the max number of processes would be annoying to the sysadmin,
>> and would mean that the feature would never get used outside of $WORK
>> in practice.)
>>
>> So is the actual underlying problem that e2fsck is running out of
>> memory?  If so, is it because there simply isn't enough physical
>> memory available?  Is it being run in a cgroup container which is too
>> small?  Or is it because too many file systems are being checked in
>> parallel at the same time?  
>>
>> Or is it I/O errors that you are concerned with?  And how do you know
>> that they are not permanent errors; is thie caused by something like
>> fibre channel connections being flaky?
>>
>> Or is this a hypotethical worry, as opposed to something which is
>> causing operational problems right now?
>>
>> Cheers,
>>
>> 					- Ted
>> 					
>> .
>>
> 
> .
>