Re: FS corruption; HTREE-related?

"JP Howard" <jh_lists@fastmail.fm> · Mon, 7 Oct 2002 09:35:32 UT

On Mon, 7 Oct 2002 09:56:27 +0100, "Stephen C. Tweedie" <sct@redhat.com>
said:
> On Mon, Oct 07, 2002 at 05:07:20AM +0000, JP Howard wrote:
> > esfsck shows "Inodes that were part of a corrupted orphan linked list
> > found."
> 
> That can be a harmless side-effect of an older version of e2fsck.  If
> e2fsck finds a partial truncate which needs to be completed, it used
> to leave the dtime field of the inode intact rather than clearing it.
> That shows up on the *next* forced fsck as the error you saw.  If you
> used an older fsck on the last reboot, that would explain this
> message.
> 
OK, I had a feeling that was unrelated.

> > debugfs:  testi 392.
> > Inode 14992585 is not in use
> 
> Deleted file confirmed.
> 
However, this file shouldn't have shown as deleted. It looks a lot like
the wrong inode got changed when some *other* file was deleted...

> > Any ideas on what's causing this? e2fsck causes the problem files to be
> > removed. For now we've disabled directory indexing--if the problem
> > continues after doing this, I'll update the list with the details.
> 
> Thanks, we need that to work out where the problem might be.
> 
Redoing the same thing with dir indexing off has not replicated the
problem. I've run it twice as long as yesterday, with no
errors--certainly seems like a htree issue.

Regarding the kernel messages, we saw exactly the same as this gentleman:
----
On Sun, 6 Oct 2002 11:00:49 -0400, "Douglas J Hunley"
<doug@hunley.homeip.net> said:
> In additin to the 'brelse' issue I mentioned a few days back, I'm seeing
> new 
> messages in my logs about ext3 trying to unlink non-existant files. any 
> ideas? log mesages are below:
> Oct  5 19:34:40 linux-sxs kernel: EXT3-fs warning (device sd(8,19)): 
> ext3_unlink: Deleting nonexistent file (1433981), 0
> Oct  5 19:38:32 linux-sxs kernel: VFS: brelse: Trying to free free buffer
----

> If you just cleared the dir index bit with htree, then the htree data
> structures will be completely ignored, but you can recreate them with
> fsck.
> 
That's what I did.

I'm now completing our production server migration without dir indexing,
but am creating a loopback filesystem to try to replicate this problem so
we can fix it and use htree soon. I think we'll need it to get the
performance we've budgeted for on our new servers. If we can replicate
the process, what other diagnostics should we provide? Is there any
additional telemetry we can create?

When we turn on dir indexing again, we'll need to run fsck -D because the
filesystem will already be full of data. Do we need to do that on an
unmounted filesystem? How long might we expect it to take on around ten
million files in a 5-disk RAID 5 array--minutes, hours, or days? We'll do
some timing on our loopback filesystem, but it won't be a very true guess
because the machine will be pretty loaded...

_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users