Re: Segment magic number invalid

Jiro SEKIBA <jir@xxxxxxxxx> · Wed, 19 May 2010 22:49:30 +0900

Hi,

At Mon, 17 May 2010 16:26:14 +0900 (JST),
Ryusuke Konishi wrote:
> 
> Hi,
> On Mon, 17 May 2010 00:14:59 -0400, Paul L wrote:
> > Wow, that does the trick! Here is the output after I change the number to 500.
> > 
> > Super-block:
> >     revision = 2.0
> >     blocksize = 4096
> >     write time = 2010-05-14 13:58:33
> >     indicated log: blocknr = 1608334
> >         segnum = 785, seq = 307637, cno=671100
> > 
> > Clean FS.
> > The latest log is lost. Trying rollback recovery..
> > ...................
> > Searching the latest checkpoint.
> > Selected log: blocknr = 1304576
> >     segnum = 637, seq = 307489, cno=670854
> >     creation time = 2010-05-14 09:48:18
> > Do you wish to overwrite super block (y/N)? y
> > Recovery will complete on mount.
> > 
> > I then mount /home, and things seem to work fine! Can I be sure that
> > everything will be ok from now on? Or should I backup and reformat?
> > Thanks!
> 
> That was good! 
> 
> I recommend you to backup just now and reformat the partition since
> the recovered filesystem rewound logs and may contain broken blocks
> that GC later overrode.  (If you were suspending GC, the filesystem is
> guranteed to be consistent.)
> 
> It may be clean, but the fsck tool doesn't yet support sanity check.
> 
> > Actually, I now wonder what could have gone wrong in my case. I wasn't
> > doing any disk intensive task and the machine wasn't suspended to ram
> > during the course.
> 
> This problem can happen if the block device doesn't support "write
> barrier" properly.  Unfortunately, such devices are not uncommon even
> now.
> 
> I feel we should do something for it.

How abou updating one of super blocks when super block update needed?

I'm hoping that even "write barrier" is not supported, back up super block
likely points valid older log.

In case latest super block, which has greater CP, points invalid log,
try to search log from the other super block that has older log and
roll forwad the log until it finds newer super root.

Futher more, because each super block is less frequently updated,
it may address low-end consumer flash super block hot-spot issue.

thanks,

regards

> > On a separate note, I noticed that Nilfs2 has a higher chance of
> > corruption when I mount over (a slightly sluggish) network. My backup
> > plan is also using Nilfs2, as detailed in this web page:
> > 
> >   http://www.thev.net/PaulLiu/backup-plan.html
> > 
> > I rarely had any problem doing it over a USB hard drive, but when I
> > remotely mount the backup image from a SMBFS over the network, I ran
> > into problems of dangling nilfs_cleanerd, or just simply corrupted
> > Nilfs2 partition for quite a few times, till the point that I reverted
> > back to a local USB drive. Maybe this kind of use case can help you
> > guys debugging the code and make it more robust. Just a wishful
> > thought!
> 
> Thanks for the information.  Looks helpful for debugging.
> 
> Thanks,
> Ryusuke Konishi
> 
> > On Sun, May 16, 2010 at 10:46 PM, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
> > > Hi,
> > > On Sun, 16 May 2010 14:32:05 -0400, Paul L wrote:
> > >> Sorry, should have sent it to the list instead.
> > >>
> > >>  Thanks for the patch! I tried it, but seems it still can't find the
> > >>  super root. Here is the output. What shall I do now?
> > >>
> > >>  Super-block:
> > >>      revision = 2.0
> > >>      blocksize = 4096
> > >>      write time = 2010-05-14 13:58:33
> > >>      indicated log: blocknr = 1608334
> > >>          segnum = 785, seq = 307637, cno=671100
> > >>
> > >>  Clean FS.
> > >>  The latest log is lost. Trying rollback recovery..
> > >>  .......
> > >>  fsck0.nilfs2: Cannot find super root
> > >
> > > Can you try increasing the number defined at the following line in
> > > sbin/fsck/fsck0.nilfs2.c ?
> > >
> > >  #define  MAX_SCAN_SEGMENT          50
> > >
> > > Regards,
> > > Ryusuke Konishi
> > >
> > >> > On 5/15/10, Ryusuke Konishi <ryusuke@xxxxxxxx> wrote:
> > >> >> Hi,
> > >> >> On Fri, 14 May 2010 20:24:02 -0400, Paul L wrote:
> > >> >>> I have my home directory mounted as a nilfs2 partition. Today what
> > >> >>> happened was that first I noticed google-chrome reporting it cannot
> > >> >>> load user profile, I initially thought it was a google-chrome error.
> > >> >>> At the time I was still able to view and modify my home directory. But
> > >> >>> then after rebooting the system, my home partition no longer mounts.
> > >> >>> I'm using nilfs-2.0.19 and nilfs-utils-2.0.18 with Linux kernel
> > >> >>> 2.6.28.
> > >> >>>
> > >> >>> Here is the error message from dmesg (after turning on debugging
> > >> >>> message for nilfs2):
> > >> >>>
> > >> >>> NILFS nilfs_fill_super: start(silent=0)
> > >> >>> NILFS(recovery) nilfs_search_super_root: looking segment
> > >> >>> (seg_start=1607680, seg_end=1609727, segnum=785, seg_seq=307637)
> > >> >>> NILFS(recovery) load_segment_summary: checking segment
> > >> >>> (pseg_start=1608334, full_check=0)
> > >> >>> NILFS(recovery) load_segment_summary: done (ret=3)
> > >> >>> NILFS(recovery) nilfs_search_super_root: strayed: scan_newer=0, ret=3
> > >> >>> NILFS warning: Segment magic number invalid
> > >> >>> NILFS: error searching super root.
> > >> >>> NILFS nilfs_fill_super: aborted
> > >> >>> NILFS put_nilfs: the_nilfs on bdev mmcblk0p1 was freed
> > >> >>>
> > >> >>> I then dumped the first and last (backup) copy of the nilfs2 super
> > >> >>> block, they are identical, and given below:
> > >> >>>
> > >> >>> 00000400   02 00 00 00 00 00 34 34  00 01 00 00 A1 6A E9 71
> > >> >>> ......44.....j.q
> > >> >>> 00000410   A3 F1 DD BE 02 00 00 00  AF 07 00 00 00 00 00 00
> > >> >>> ................
> > >> >>> 00000420   00 E0 BF D7 03 00 00 00  01 00 00 00 00 00 00 00
> > >> >>> ................
> > >> >>> 00000430   00 08 00 00 05 00 00 00  7C 3D 0A 00 00 00 00 00
> > >> >>> ........|=......
> > >> >>> 00000440   8E 8A 18 00 00 00 00 00  B5 B1 04 00 00 00 00 00
> > >> >>> ................
> > >> >>> 00000450   00 B8 23 00 00 00 00 00  B9 AF F3 4A 00 00 00 00
> > >> >>> ..#........J....
> > >> >>> 00000460   D9 E1 D6 4B 00 00 00 00  49 8F ED 4B 00 00 00 00
> > >> >>> ...K....I..K....
> > >> >>> 00000470   37 00 32 00 03 00 01 00  B9 AF F3 4A 00 00 00 00
> > >> >>> 7.2........J....elp
> > >> >>> 00000480   00 4E ED 00 00 00 00 00  00 00 00 00 0B 00 00 00
> > >> >>> .N..............
> > >> >>> 00000490   80 00 20 00 C0 00 10 00  13 1C FC 11 D7 43 4C 09  ..
> > >> >>> ..........CL.
> > >> >>> 000004A0   81 64 93 0A F4 54 CF 5E  48 4F 4D 45 00 00 00 00
> > >> >>> .d...T.^HOME....
> > >> >>>
> > >> >>>
> > >> >>> I wonder if there is a fsck tool to help me recover the file system.
> > >> >>> Any help is greatly appreciated!
> > >> >>>
> > >> >>> PS: last time I had a different problem of losing partition info, and
> > >> >>> later successfully recovered with the help from people on the list. So
> > >> >>> thanks! Now I'm actually backing up my files every two weeks, but
> > >> >>> it'll still be great if it can recover and even better if we can trace
> > >> >>> the problem.
> > >> >>
> > >> >> Your filesystem seems to have lost the latest log according to the
> > >> >> report.
> > >> >>
> > >> >> The attached patch may help to recover it.  It is revised scan tool
> > >> >> for nilfs-utils-2.0.18.
> > >> >>
> > >> >> After compiling the tool, you can use it like:
> > >> >>
> > >> >>  # cd nilfs-utils-2.0.18
> > >> >>  # sbin/fsck/fsck0 <device>
> > >> >>
> > >> >> The tool will confirm whether to update super blocks if it finds the
> > >> >> latest log.
> > >> >>
> > >> >> You may need to do
> > >> >>
> > >> >>  $ aclocal && autoheader && libtoolize -c --foce && automake -a -c &&
> > >> >> autoconf
> > >> >>  $ ./configure
> > >> >>
> > >> >> before build the tool.
> > >> >>
> > >> >> With regards,
> > >> >> Ryusuke Konishi
> > >> >>
> > >> >
> > >> >
> > >> > --
> > >> > Regards,
> > >> > Paul Liu
> > >> >
> > >> > Yale Haskell Group
> > >> > http://www.haskell.org/yale
> > >> >
> > >>
> > >>
> > >> --
> > >> Regards,
> > >> Paul Liu
> > >>
> > >> Yale Haskell Group
> > >> http://www.haskell.org/yale
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > 
> > 
> > 
> > -- 
> > Regards,
> > Paul Liu
> > 
> > Yale Haskell Group
> > http://www.haskell.org/yale
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

-- 
Jiro SEKIBA <jir@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html