Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)

Shaun Adolphson <shaun@xxxxxxxxxxxxx> · Mon, 16 Aug 2010 20:32:16 +1000

On Mon, Jul 12, 2010 at 3:45 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Mon, Jul 12, 2010 at 11:08:32AM +1000, Dave Chinner wrote:
> > On Sun, Jul 11, 2010 at 09:44:07PM +1000, Shaun Adolphson wrote:
> > > On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@xxxxxxxxxxxxx> wrote:
> > > > On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > >>
> > > >> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> > > >> > Hi,
> > > >> >
> > > >> > We have been able to repeatably produce xfs internal errors
> > > >> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> > > >> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> > > >> > xfs drive. The copy gets about 96% of the way through and we get the
> > > >> > following messages:
> > > >> >
> > > >> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> > > >> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> > > >> > Caller 0xffffffff8837446f
> > > >>
> > > >> Interesting. That's a corrupted inode extent btree - I haven't seen
> > > >> one of them for a long while. Were there any errors (like IO errors)
> > > >> reported before this?
> > > >>
> > > >> However, the first step is to determine if the error is on disk or an
> > > >> in-memory error. Can you post output of:
> > > >>
> > > >>        - xfs_info <mntpt>
> > >
> > > meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> > > agsize=32768 blks
> > >               =                      sectsz=512   attr=1
> > > data        =                      bsize=4096   blocks=4272433152, imaxpct=25
> > >               =                      sunit=0      swidth=0 blks
> > > naming   =version 2         bsize=4096   ascii-ci=0
> > > log         =internal            bsize=4096   blocks=2560, version=1
> > >              =                       sectsz=512   sunit=0 blks, lazy-count=0
> > > realtime  =none               extsz=4096   blocks=0, rtextents=0
> >
> > WHy did you make this filesystem with 128MB allocation groups? The
> > default for a filesystem of this size is 1TB allocation groups.
> > More than 100k allocation groups will certainly push internal AG
> > scanning scalability past it's tested limits....
> >
> > Also, a log of 10MB is rather small, and it tells me that you didn't
> > just create this filesystem firectly on the 16TB block device with a
> > recent mkfs.xfs. That is, at current mkfs.xfs defaults to get a layout like
> > this you'd have to ѕtart with a 512MB filesystem and grow it to
> > 16TB.
>
> Actually, an old mkfs that defaults to 16 AGs and a filesystem size
> of 2GB in needed to get a log of 2540 blocks. I just grew one of
> these to roughly 16TB and ended up with 125,000 AGs, so it's in the
> ballpark. Also, *allocating* 250GB to a single file (as
> preallocation) doesn't appear to have any problems on 2.6.35-rc4, so
> there doesn't appear to be any general error caused by this
> configuration in mainline....
>
> Can you run this command:
>
> # xfs_io -f -c "truncate 250g" -c "resvsp 0 250g" <test file>
>
> And see if that generates the same corruption as copying a file?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

Hi David,

After many weeks of planning to move all the data off and back on
again we are happy to say the the partition is now working as
expected.

In the end we managed to backup all data on our partition and we
re-created it using the mkfs.xfs default options. This time we have 16
allocation groups as you suggested we should have.

It appears that the original partition was grown from an extremely
small size to have created that many allocation groups.

I would like to thank the xfs mailing list for all its help.

Regards,

Shaun

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs