Re: NILFS: corrupt root inode after Turbo Mode?

Vyacheslav Dubeyko <slava@xxxxxxxxxxx> · Fri, 12 Oct 2012 11:10:32 +0400

Hi,

On Wed, 2012-10-10 at 22:39 +0200, Piotr Szymaniak wrote:
> maszyn ~ # nilfs-tune -l /dev/sdf3
...
> Filesystem state:         invalid or mounted
...
> Filesystem created:       Fri Aug  3 08:37:06 2012
> Last mount time:          Thu Jan  1 01:00:01 1970
> Last write time:          Thu Jan  1 01:00:01 1970
> Mount count:              56
> Maximum mount count:      50
...
> Number of segments:       922
> Device size:              7741636608
> First data block:         1
> # of blocks per segment:  2048
> Reserved segments %:      5
> Last checkpoint #:        1873
> Last block address:       734128
> Last sequence #:          358
> Free blocks count:        1150976
...

First of all, it is possible to see that file system was not unmounted. It was 56 mounts but during last mount superblock was not updated properly. It means that it was sudden power-off, kernel crush or superblock wasn't flushed because some reason.

Moreover, last mount time and last write time are strange. Usually, these fields have real time of last modifications but you haven't so. File system creation time is defined by means of mkfs utility but last mount time and write time are defined by driver. So, maybe it is a slight superblock corruption.

Thereby, there is some probability of primary superblock inconsistency. Could you share raw dump of second superblock that is located at the volume end? Moreover, could you share dumpseg of next segment after last sequence # (namely, 359) and before of it (namely, 357)?

On Thu, 2012-10-11 at 20:03 +0200, Piotr Szymaniak wrote:
> On Thu, Oct 11, 2012 at 02:12:00PM +0400, Vyacheslav Dubeyko wrote:
> > On Thu, 2012-10-11 at 11:23 +0200, Piotr Szymaniak wrote:
> > > So I should dump block 743205 and 734158? Ok, but I'm not familiar with
> > > blkoff.
> > > 
> > > ie. blkoff = 2, blocknr = 734205 it means the dump should be block
> > > 734205 and (blkoff = 2) next two blocks? If this is correct then this
> > > should work, right?
> > > dd if=/dev/<device> of=<dump> bs=2048 skip=734205 count=3 (not sure if
> > > dd counts skip= and count= from 0 or 1)
> > > And similar with block 734158 but only one block (blkoff = 0)?
> > > 
> > 
> > Sorry, I confuse you by blkoff mentioning. The blkoff is logical offset from file begin (for example, from ifile begin). You need to take into account only blocknr.
> > 
> > It needs to dump block 743205 and 734158. Your superblock's content informs that block size is 4096 bytes.
> > 
> > For example, dd if=/dev/<device> of=<dump> bs=4096 skip=734205 count=1
> 
> Block 743205 is empty:
> maszyn tmp (: cat dump.743205
> maszyn tmp (: hexdump dump.743205
> 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> *
> 0001000
> 

It is bad. This block should contain root inode description. So, it is clear why you have such error message during mount trying.

I think that it needs to understand how deeply destroyed metadata files in last checkpoints. In dumpseg of #358 segment you have description of all blocks for ifile (ino = 6). Could you check what blocks are empty and what are not? Could you share results of checking ifile (ino = 6) blocks for all checkpoints that are described in dumpseg of #358 segment?

It needs to find in previous checkpoints valid ifile block (not empty) with blkoff = 2. Could you try to find? It needs to make dumpseg of previous segments and search not empty block of ifile (ino = 6) with blkoff = 2.

> Block 734158 attached.
> 

Block 734158 contains root folder (ino = 2) directory entries (bin, boott, dev, etc, home5, lib, media, mnt, opt, proc, root, run, sbin, sys, tmp, usr, var, test.318) as I can see.

Thereby, I can distinguish two problem: (1) file system recovering and (2) issue investigation.

Could you save raw dump of corrupted file system in untouched state for further investigations?

I think that the reason of such situation is absence or not proper flushing of dirty blocks. The question is what the reason of it. I can see several possible reasons: (1) hardware issue, (2) kernel crush, (3) file system issue. Anyway, it needs to describe the issue reproduction path for understanding the reason. Could you achieve reproduction of the issue and describe reproduction path?

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html