Nasty ext3 errors 2.4.18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

 

I’ve got serious troubles – I posted a while back about experiencing ext3 errors using 2.4.18, at the time I put the problems down to harddisk failure, but these problems are occurring more and more - not all of our systems are having this problem but 3 systems have now shown this problem.

The hardware is essentially the same, the only difference is disk manufacturers but we’ve now seen the problems on several brands of disk (Maxtor, Seagate, ibm etc) so I can’t simply put this down to disk failure (motherboard failure possibly, but 3 different motherboards?)

 

I’ve included below some of the debug output from the kernel below, there is a lot of it so I’ve only included the different types of errors reported (with times when the problems started)

 

Thu 12/12/02 15:50:37.315  [KMSG:<2>EXT3-fs error (device ide0(3,10)): ext3_free_blocks: Freeing blocks in system zones - Block = 128, count = 1]

 

Fri 13/12/02 23:55:46.383  [KMSG:<4> <6>attempt to access beyond end of device]

[KMSG:<6>16:05: rw=0, want=137058900, limit=39230698]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_branches: Read failure, inode=3567502, block=-1576348012]

[KMSG:<6>attempt to access beyond end of device]

[KMSG:<6>16:05: rw=0, want=1724901664, limit=39230698]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_branches: Read failure, inode=3567502, block=-642516409]

[KMSG:<6>attempt to access beyond end of device]

<snip>

 

Fri 13/12/02 23:55:46.411 [KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_branches: Read failure, inode=3567502, block=1329885327]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 1874129395, count = 1]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 203477977, count = 1]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 2877124100, count = 1]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 103093662, count = 1]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 3719271906, count = 1]

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 4274192639, count = 1]

<snip>

 

 

[KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: bit already cleared for block 9126180]

<snip>

 

 

Fri 13/12/02 23:56:00.242 [KMSG:<2>EXT3-fs error (device ide1(22,5)): ext3_free_blocks: Freeing blocks not in datazone - block = 3305048842, count = 1]

[KMSG:<0>Assertion failure in do_get_write_access() at transaction.c:589: "handle->h_buffer_credits > 0"]

[KMSG:<4>invalid operand: 0000]

[KMSG:<4>CPU:    0]

[KMSG:<4>EIP:    0010:[<c0156d09>]    Not tainted]

[KMSG:<4>EFLAGS: 00010286]

[KMSG:<4>eax: 00000063   ebx: cefb4ac0   ecx: c55ac3c0   edx: fffffffe]

[KMSG:<4>esi: c7083f00   edi: 00000002   ebp: c7083f00   esp: c0605c20]

[KMSG:<4>ds: 0018   es: 0018   ss: 0018]

[KMSG:<4>Process videoexe (pid: 3412, stackpage=c0605000)]

[KMSG:<4>Stack: c0232720 c02328e6 c0232700 0000024d c0232921 cce0f000 ccd24a60 cefb4ac0 ]

[KMSG:<4>       cce0f094 cce0f094 00000000 00000000 cce0f000 cc434760 c01570d8 ccd24a60 ]

[KMSG:<4>       cefb4ac0 00000000 00000000 c7083f00 ccd24a60 c39cf460 c0150798 ccd24a60 ]

[KMSG:<4>Call Trace: [<c01570d8>] [<c0150798>] [<c01570e0>] [<c01508fc>] [<c0150b98>] ]

[KMSG:<4>   [<c0150a68>] [<c0150a68>] [<c0150a68>] [<c0150c79>] [<c0150f0b>] [<c015763c>] ]

[KMSG:<4>   [<c01516c2>] [<c0151727>] [<c0121b5f>] [<c011feae>] [<c011ff0d>] [<c0140567>] ]

[KMSG:<4>   [<c01518d1>] [<c01406bc>] [<c012cc3d>] [<c013796a>] [<c012dae7>] [<c012de3a>] ]

[KMSG:<4>   [<c0106b87>] ]

[KMSG:<4>]

[KMSG:<4>Code: 0f 0b 83 c4 14 8b 54 24 28 8b 42 04 48 8b 4c 24 28 89 41 04 ]

 

^^^

That’s the final nail in the coffin as the process then locks solid (but still has threads running which then run out of memory – total chaos ensues) the box has to be powered off/on. The disks fsck when the machine comes back up – no reports of any hardware IO errors.

 

The profile of the machine is that its doing lots of disk IO as its capturing video to disk – there are 3 partitions used, the problem is only occurring on one of them, they are each roughly 40 gig in size.

The only thing to note is that there was an issue where this partition and another filled up and I had to make space (just by deleting files, now the maximum used space is around 76%) – once I’d cleaned up the file-system the system ran fine until the first error was reported at 15:50 on Thursday (as shown above), and then Friday (last night) it just went haywire.

 

The only other thing to note is that there was a panic on kswapd a number of hours earlier – but I’ve seen these on other systems running 2.4.18 and they don’t seem to cause any problems (I think).

 

As I’ve mentioned I’ve seen the same behavior before on other systems, the specs for all of them are:

 

Abit ST6 Motherboard with 1.2 Gig Celeron

2 x disks (varying sizes and makes)

128Meg Ram

AGP Graphics Card

Ethernet

Bt848 capture cards (2-3 depending on customer)

 

I’m really pulling my hair out – I don’t know why they are doing this – these are all on customer sites (they never go wrong in the office, each one that have gone bad has been in different environments i.e. warm, cold, no power spikes or anything reported) – and at the moment as you can imagine we are not flavor of the month so I really need to come up with a bullet-proof plan (one customer is one his second box, which did the same as the first after 2 days – it ran in our office for 2 weeks no problems!)

 

I know I’ve probably not given enough info (sorry I can’t get a better trace of the panic) – but any help that anyone can give will really really really be appreciated.

 

Thanks,

 

Glen

 


[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux