Re: xfs_repair: "fatal error -- ran out of disk space!"

"Patrick J. LoPresti" <lopresti@xxxxxxxxx> · Wed, 22 Jun 2011 16:41:57 -0700

Hi, Dave and Eric.  And thank you for the quick reply.

I blew away a couple of files (200-300 megabytes; I did not write it
down) and then xfs_repair succeeded.  And now "df" shows the partition
as 100% full (265M free out of 5.1T), not 93% full (399G free).

I think the file system actually was full, but corrupted.  The reason
I was trying to run xfs_repair is that the system was acting...
"funny" (but not "ha ha" funny).  Specifically, a nfsd task was
consuming 100% CPU even though no NFS traffic was visible on the
network.  cat /proc/task_id/stack suggested the nfsd was in an
infinite loop calling into XFS trying to allocate an extent or
something.  This nfsd held a lock making it impossible to umount the
partition (among other things).

My guess is that nfsd was fooled much like df into thinking there was
space available, but when it tried to actually obtain that space, it
was told "please try again".  Which it did, forever.

I guess one question is how xfs_repair should behave in this case.  I
mean, what if the file system had been full, but too corrupt for me to
delete anything?

Anyway, my problem is fixed.  Well, until the filesystem gets
corrupted again, anyway; I still have not identified the underlying
cause of that...

Thank you again for the prompt response.

 - Pat

On Wed, Jun 22, 2011 at 4:24 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Jun 22, 2011 at 05:27:14PM -0500, Eric Sandeen wrote:
>> On 6/22/11 4:32 PM, Patrick J. LoPresti wrote:
>> > I have a 5.1TB XFS file system that is 93% full (399G free according to "df").
>> >
>> > I am trying to run "xfs_repair" on it.
>> >
>> > The output is appended.
>> >
>> > Question:  What am I supposed to do about this?  "xfs_repair -V" says
>> > "xfs_repair version 3.1.5".  (I downloaded and built the latest
>> > version hoping it would fix the issue, but no luck.)  Should I just
>> > start deleting files at random?
>>
>> You could start by removing a few files you know you don't need, rather than
>> at random.  :)
>>
>> TBH I've not seen this one before, and the error message is not all that
>> helpful.  It'd be nice to know how many blocks it was trying to reserve
>> when it ran out of space; I guess you'd need to use gdb, or instrument
>> all the calls to res_failed() in phase6.c to know for sure...
>
> Also, the number of inodes and directories in your filesystem might
> tell us whether we should expect an ENOSPC, as well. I suspect that
> there's an accounting error, because 400GB of transaction
> reservations is an awful lot of directory rebuilds....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs