Bad directories appearing in ext3 after upgrade 2.4.16 -> 2.4.18+cvs

neilb@cse.unsw.edu.au (Neil Brown) · Tue, 21 May 2002 16:30:49 +1000 (EST)

Hi,
 I recently upgraded one of my fileservers from 2.4.16 to 2.4.18 plus
 the ext3-cvs.patch that Andrew Morton pointed me to for addressing
 and assertion failure.

 Since then I have been getting lots of errors like:

May 21 14:07:03 glass kernel: EXT3-fs error (device md(9,0)): ext3_add_entry: bad entry in directory #2945366: rec_len %% 4 != 0 - offset=0, inode=1886221359, rec_len=24927, name_len=109
May 21 14:07:23 glass kernel: EXT3-fs error (device md(9,0)): ext3_readdir: bad entry in directory #2945366: rec_len %% 4 != 0 - offset=0, inode=1886221359, rec_len=24927, name_len=109
May 21 14:07:23 glass kernel: EXT3-fs warning (device md(9,0)): empty_dir: bad directory (dir #2945366) - no `.' or `..'

 If I hunt down the directory (find .  -inum ...) and do an "ls -la",
 it appears empty and reponds reasonably well to "rmdir".

 So far the directories have mostly been browers caches, so no real
 data has been lost (I think), but it is worrisome.

 I will probably revert to 2.4.16 plus the relevant bits of the CVS
 patch hand-applied.  But I wonder if anyone else has this or
 has any idea what might be happening?
 The directories haven't been moved from buffercache to
 pagecache between 2.4.16 and 2.4.18 or anything like that have they?

 Possibly related...

 My ext3 filesystem is on a raid5 array, with the journal on
 a separate raid1 array. (data=journal mode).

 I get quite a few messages in the logs which say:

May 21 14:20:06 glass kernel: raid5: multiple 1 requests for sector 7540536

 For a variety of sector numbers.

 This means that raid5 has received two separate write requests, with
 two separate buffer heads, for the same sector.  This seems like a
 filesystem error to me.

 raid5 tries to apply them in the same order that they were received,
 but I don't feel confident that means that the *right* thing is
 happening.

 These were happening with 2.4.16, but the incidence seems to have
 increased with 2.4.18 (though that isn't a very strong observation).

NeilBrown