On Tue, 30 Apr 2002, Andreas Dilger wrote: > On Apr 30, 2002 21:54 -0400, Tom Diehl wrote: > > It looks like you were getting garbage from the disk before the journal > assertion happened (i.e. the ext3 error), and the journal assertion is > just there to save your filesystem from getting corrupted with further > bad operations. > > > This is a stock 7.2 system with all revelant updates. > > Not sure what other info to provide so if I missed something please let me > > know. > > I would really recommend upgrading to the latest RH errata kernel. The > ext3 code has had a number of bugs fixed since 2.4.9. It might also be > related to IDE stuff, don't know. AFAIK 2.4.9-31 is their latest errata kernel. Just checked the ftp site and that is the latest one there, although the roumer mill would suggest this might change shortly. I could upgrade to their beta kernel I suppose. > > ide1: reset: success > > When did that reset happen? It wasn't in the syslog that you sent. Looks like it happened just before the logs were rotated. I missed it sorry. Here it is: Apr 25 04:35:18 kanga kernel: hdc: timeout waiting for DMA Apr 25 04:35:18 kanga kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14 Apr 25 04:35:18 kanga kernel: hdc: status timeout: status=0xd0 { Busy } Apr 25 04:35:18 kanga kernel: hdd: DMA disabled Apr 25 04:35:18 kanga kernel: hdc: drive not ready for command Apr 25 04:35:27 kanga kernel: ide1: reset: success The next entry was from the syslog output I provided in the previous message. So the reset was just a few seconds before. Apr 25 04:35:53 kanga syslogd 1.4.1: restart. > > EXT3-fs error (device ide1(22,65)): ext3_readdir: bad entry in directory #2665467: rec_len % 4 != 0 - offset=0, inode=762621470, rec_len=44574, name_len=110 > > The rec_len is way out. The inode number is probably also bad, but not > sure... > > > Assertion failure in journal_bmap_Rbbdc8009() at journal.c:602: "ret != 0" > > kernel BUG at journal.c:602! > > Just a symptom of bad data, not the real cause. Note that I wanted to > look at this bit of code, but that assertion is not even there anymore > (the kernel turns the filesystem read only and just returns now). So am I understanding you correctly that there is still no good way to tell if this was hdwe or a software failure? -- .............Tom "Nothing would please me more than being able to tdiehl@rogueind.com hire ten programmers and deluge the hobby market with good software." -- Bill Gates 1976 We are still waiting ....