[Bug 14354] Bad corruption with 2.6.32-rc1 and upwards

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Fri, 23 Oct 2009 07:45:03 GMT

http://bugzilla.kernel.org/show_bug.cgi?id=14354

--- Comment #109 from Theodore Tso <tytso@xxxxxxx>  2009-10-23 07:44:59 ---
James,

>I got some corruption, but not ro changes.

When you say corruption, you mean "file data corruption", right?  You said it
was a database; was it a database that uses fsync() --- or put another way, was
the database (could you tell us what database it was) one that claims to have
ACID properties?

>Today, however, running -rc5 the box simply rebooted without notice.  and I got
>a number of log entries after the reboot...  I stut down enough to 
>unmount /var and ran e2fsck.  That generated a *slew* of errors, mostly
>complaining about multiple files claiming the same blocks.  A second e2fsck
>right after (with -f) showed no further errors.  I now have about 70 megs
>of data in lost+found.

Hmm, this sounds like the patch didn't actually help.   And am I right that you
never saw the "filesystem is readonly" plus a kernel stack dump in your system
logs or in dmesg?   The other thing which is interesting is that this happened
on a non-root filesystem (/var), which means that journal wasn't replayed when
the root filesystem was mounted read-only, but the journal was replayed by
e2fsck.

Another question --- did you have your file system configured with "tune2fs -c
1", was described in comment #59?   One worry I have is that in fact the file
system may have been corrupted earlier, and it simply wasn't noticed right
away.   In the case of fsck complaining about blocks claimed by multiple
inodes, there are two causes of this.  One is that one or more blocks in the
inode table get written to the wrong place, overwriting another block(s) in the
inode table.  In that case, the pattern of corruption tends to be that since
inode N is written on top of inode N+m*16 or N+m*32 (depending on whether you
are using 128-byte or 256-byte sized inodes) and inode N+1 is written on top of
inode (N+1)+(m+16) or (N+1)+(m*32). it's quite easy to see this pattern from
the e2fsck transcript.

The second case is one where the block allocation bitmap gets corrupted, such
that some blocks which are in use are marked as free, and *then* the file
system is remounted and files are written to the file system, such that the
blocks are reallocated for new files.   In that case, the pattern of the
multiply-claimed blocks is random, and it's likely that you will see one or
more inodes where the inode is sharing blocks with more than one inodes, and
where there is no relationship between the inode numbers of inodes that are
using a particular block.

So far, the fsck transcripts with pass1b that people have submitted to me tend
to be of the second form --- which is why I recommend the use of "tune2fs -c
1"; if the file system corruptions causing data loss are caused by corrupted
block allocation blocks, then checking the filesystems after every single boot
means that you might see pass 5 failures, but not the pass1b failures that are
associated with data loss.

Obviously, we don't want to run with "tune2fs -c 1" indefinitely, since that
obviously slows down boot times, but for people who are interested in helping
us debug this issue, it should allow them to avoid data loss and also help us
identify *when* the file system had gotten corrupted (i.e., during the previous
boot session), and hopefully allow us to find the root cause to this problem.

-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html