Re: [PATCH, v5] ext3: validate directory entry data before use

Valdis.Kletnieks@xxxxxx · Thu, 03 Jul 2008 03:51:49 -0400

On Mon, 30 Jun 2008 23:00:18 BST, Duane Griffin said:
> ext3_dx_find_entry uses ext3_next_entry without verifying that the entry is
> valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop
> to check the validity of entries before checking whether they match and
> moving onto the next one.

This may or may not be related, but I've managed to hit another interesting
piece of ext3 damage while running 26-rc8-mmotd-0701:

% /bin/ls -l /usr/share/man/man5 | grep lvm
/bin/ls: cannot access /usr/share/man/man5/lvm.conf.5.gz: Stale NFS file handle
-????????? ? ?    ?        ?                ? lvm.conf.5.gz

Yes, that *is* on an ext3 filesystem.

debugfs on /usr/share is interesting:

debugfs:  stat  /man/man5/lvm.conf.5.gz
Inode: 59918   Type: regular    Mode:  0644   Flags: 0x0
Generation: 4228691378    Version: 0x00000000
User:     0   Group:     0   Size: 0
File ACL: 239201    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x486c6c0b -- Thu Jul  3 02:04:59 2008
atime: 0x47efcad7 -- Sun Mar 30 13:16:07 2008
mtime: 0x486c6c0b -- Thu Jul  3 02:04:59 2008
dtime: 0x486c6c0b -- Thu Jul  3 02:04:59 2008
BLOCKS:

Zero links, even though man/man5 references it.  and the ctime/mtime/dtime
are suspicious as well - that file belongs to an RPM that was last updated
back on June 20, and there's no obvious culprit processes in lastcomm that
were running at 2:04AM, and none of the current ones look obvious either.

(system was booted at 00:21, so the failure happened about 1 hours 40 mins
after the current kernel launched).

Nothing in dmesg from around 2:04AM, and nothing around when the /bin/ls is run.

An 'ls -lR /usr/share' shows that the *other* 127,619 files on the filesystem
are all OK, it's just this one.

Any brilliant ideas on how to track this down further?

Attachment:
pgpaaXTTyFEf6.pgp

Description: PGP signature