Re: Subject : Happened again, 20140811 -- Got "Internal error XFS_WANT_CORRUPTED_GOTO". Filesystem needs reformatting to correct issue.

Brian Foster <bfoster@xxxxxxxxxx> · Tue, 12 Aug 2014 17:59:43 -0400

On Tue, Aug 12, 2014 at 02:27:58PM -0700, Eric Sandeen wrote:
> On 8/12/14, 9:51 AM, Brian Foster wrote:
> > On Tue, Aug 12, 2014 at 02:17:00AM +0200, Carlos E. R. wrote:
> > Content-ID: <alpine.LSU.2.11.1408120142170.21410@minas-tirith.valinor>
> > 
> > 
> > El 2014-08-12 a las 00:36 +0200, Carlos E. R. escribió:
> >>>> El 2014-08-11 a las 16:56 -0500, Mark Tinguely escribió:
> > 
> >>>> but all of them are about 401M before compression. The upload will take
> >>>> long, my ADSL upload is 0.3M/s at most.
> > 
> > 
> > I have shared (view) on google drive a folder with the three files. Both
> > Brian Foster and Mark Tinguely should have got a link on the mail from me.
> > If somebody else wants access, just tell me.
> > 
> > 
> >> I see the same thing from repair that was in your repair output:
> > 
> >> block (1,12608397-12608397) multiply claimed by cnt space tree, state - 2
> > 
> >> If I take a look at the btrees as is, I see "235:[12608397,10]" included
> >> in the bnobt (fsb 0x200aa55) and "270:[12608397,10]" in the cntbt (fsb
> >> 0x2000781). If I skip the mount, zero the log and repair, everything
> >> seems Ok. I can allocate the remainder of available space and rm -rf
> >> everything in the fs without an error.
> > 
> >> Once I replay the log, I see "272:[12608397,10] 273:[12608397,10]" in
> >> the cntbt, which is clearly a duplicate entry. This is what repair
> >> detects and cleans up and seems to lead to the shutdown. E.g., if I
> >> mount and use the fs, I can hit an assert or failure just by attempting
> >> to allocate the rest of the space in the fs. If that is the state of the
> >> fs on disk, it's only a matter of time we explode due to allocating and
> >> freeing that range of space or possibly attempting to allocate that
> >> space twice.
> > 
> >> Mark mentioned that he didn't see the superblock item in the log with
> >> regard to the freeze. I don't see that either... which perhaps suggests
> >> that this all happens during the wake-from-hibernate sequence..? My
> >> understanding is that we should freeze on hibernate, thus force
> >> everything out to the log, write an unmount record and then dirty the
> >> log with a superblock transaction. Therefore, that should be the only
> >> item in the log post-freeze. Here, we have various items in the log
> >> including several logged buffers that correspond to the cntbt block that
> >> ends up corrupted (daddr 0xf427c08).
> 
> What freeze?  look at hibernate(), nothing but a sync:
> 
> /**
>  * hibernate - Carry out system hibernation, including saving the image.
>  */
> int hibernate(void)
> {
> ...
>         printk(KERN_INFO "PM: Syncing filesystems ... ");
>         sys_sync();
>         printk("done.\n");
> 
>         error = freeze_processes();
>         if (error)
>                 goto Exit;
> 
> 
> AFAIK there is no freeze call involved.
> 

Eep, not sure why I was thinking there was a freeze there. It appears
not. I guess that explains why the log contains what it does. Thanks for
pointing that out...

Brian

> -Eric
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs