Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 12 Jul 2010 15:45:59 +1000

On Mon, Jul 12, 2010 at 11:08:32AM +1000, Dave Chinner wrote:
> On Sun, Jul 11, 2010 at 09:44:07PM +1000, Shaun Adolphson wrote:
> > On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@xxxxxxxxxxxxx> wrote:
> > > On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >>
> > >> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> > >> > Hi,
> > >> >
> > >> > We have been able to repeatably produce xfs internal errors
> > >> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> > >> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> > >> > xfs drive. The copy gets about 96% of the way through and we get the
> > >> > following messages:
> > >> >
> > >> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> > >> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> > >> > Caller 0xffffffff8837446f
> > >>
> > >> Interesting. That's a corrupted inode extent btree - I haven't seen
> > >> one of them for a long while. Were there any errors (like IO errors)
> > >> reported before this?
> > >>
> > >> However, the first step is to determine if the error is on disk or an
> > >> in-memory error. Can you post output of:
> > >>
> > >>        - xfs_info <mntpt>
> > 
> > meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> > agsize=32768 blks
> >               =                      sectsz=512   attr=1
> > data        =                      bsize=4096   blocks=4272433152, imaxpct=25
> >               =                      sunit=0      swidth=0 blks
> > naming   =version 2         bsize=4096   ascii-ci=0
> > log         =internal            bsize=4096   blocks=2560, version=1
> >              =                       sectsz=512   sunit=0 blks, lazy-count=0
> > realtime  =none               extsz=4096   blocks=0, rtextents=0
> 
> WHy did you make this filesystem with 128MB allocation groups? The
> default for a filesystem of this size is 1TB allocation groups.
> More than 100k allocation groups will certainly push internal AG
> scanning scalability past it's tested limits....
> 
> Also, a log of 10MB is rather small, and it tells me that you didn't
> just create this filesystem firectly on the 16TB block device with a
> recent mkfs.xfs. That is, at current mkfs.xfs defaults to get a layout like
> this you'd have to ѕtart with a 512MB filesystem and grow it to
> 16TB.

Actually, an old mkfs that defaults to 16 AGs and a filesystem size
of 2GB in needed to get a log of 2540 blocks. I just grew one of
these to roughly 16TB and ended up with 125,000 AGs, so it's in the
ballpark. Also, *allocating* 250GB to a single file (as
preallocation) doesn't appear to have any problems on 2.6.35-rc4, so
there doesn't appear to be any general error caused by this
configuration in mainline....

Can you run this command:

# xfs_io -f -c "truncate 250g" -c "resvsp 0 250g" <test file>

And see if that generates the same corruption as copying a file?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs