Re: Recovery after mkfs.ext4 on a ext4

"Theodore Ts'o" <tytso@xxxxxxx> · Mon, 23 Jun 2014 13:31:51 -0400

On Mon, Jun 23, 2014 at 06:37:20PM +0200, Killian De Volder wrote:
> On 23-06-14 14:37, Theodore Ts'o wrote:
> > On Mon, Jun 23, 2014 at 08:09:37AM +0200, Killian De Volder wrote:
> >> It's still checking due to the high amount of ram it's using.
> >> However if I start a parallel check with -nf if find other errors the one with the high memory usage hasn't found yet ?
> > No, definitely not that!  Running two e2fsck's in parallel will do far
> > more harm than good.
> In parallel is a big word: the check repair is SOOO slow, it might as well been killed when the second (read-only) test is done.
> I once has a OOM because of tomuch ZRAM allocated, after I restarted e2fsck, it found more error before going into massive ram-usage.
> So I was wonder what would happen if I restarted it.
> >
> >> Should I start a new one, or is this not advised ?
> >> As sometimes I think it's bad inodes causing artificial usage of memory.
> > What part of the e2fsck run are you in?  If you are in passes
> > 1b/1c/1d, then one of the things you can do is to analyze the log
> Pass 1: Checking inodes, blocks, and sizes
> Notthing else below this except things like:
> 
> Too many illegal blocks in inode 488.
> Clear inode<y>? yes

Does it stop after one of these messages without displaying anything
else?  Or does it just continue emitting a large number of these
messages?  And is the time between each one getting longer and longer?

We do actually keep a linked list of these inode numbers so we can try
to report a directory name so you know which file has been trashed.
This happens in pass #2, so the inodes which are invalid are stored in
pass #1 and only removed in pass #2.  

So if you are seeing gazillions of bad inodes, that could very easily
be what's going on.  If so, I can imagine having some mode that we
enter after a hundred inodes where we just ask permission to blow away
all of the corrupted inodes in pass #1, without waiting until we can
give you a proper pathname.

The other possibility is that a particular indode is so badly
corrupted that we're looping trying to evaluate a particular inode.
That's why I'm asking if e2fsck is has just stopped and not printing
any more messages, in what might be an apparent infinite loop.

						 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html