On Wed, Oct 08, 2003 at 05:31:09PM +0200, Maciej W. Rozycki wrote: > The trace dump looks through the kernel stack and uses simple heuristics > to judge whether a word should be included or not: if it is in the range > covered by the kernel's text segment, it's printed. It might be pure > coincidence a specific value corresponding to a kernel address is present > at the stack as it may actually be a leftover from past execution, e.g. > within a stack frame reserved for local variables that hasn't been > initialized yet, or are simply unused for a particular execution path. > You need to analyze the backtrace, comparing it to actual code involved to > see which of the addresses are results of real function calls. > > Well, most interrupt handlers can be interrupted by other interrupts, > only the high priority ones cannot. These are marked with SA_INTERRUPT in > the flags. Of course the entry code for IRQ handling cannot be > interrupted and only a single interrupt source is selected for handling > based on predefined priorities, but once execution reaches > handle_IRQ_event() (which calls specific handlers registered by drivers), > another interrupt can be taken. > > The trace doesn't look suspicious at first sight to me. > > The PMAD-A cannot be handled the same way as the others since it has a > sane buffer space layout, something that cannot be said of the others. > Therefore the stock declance.c driver doesn't handle the PMAD-A properly > -- that's functionality that needs to be implemented when the driver gets > restructured (it'll happen for 2.6 and probably a backport to 2.4 will be > available later as well). There is a patch that converts the stock driver > into one working for the PMAD-A (but it doesn't work for the others than) > and I'm told Debian uses thus modified code as a separate driver. The > patch is based on work by Dave Airlie and is available here: > 'ftp://ftp.ds2.pg.gda.pl/pub/macro/drivers/pmad-a/patch-mips-2.4.20-pre6-20021222-declance-pmad-12.gz' > -- it applies cleanly to the current version of declance.c. Well I am runing declance for the onboard and pmadaa for the add on card as far as I can tell. > Both dev->mem_start and dev->mem_end are initialized incorrectly as they > should use bus addresses and now they use CPU virtual ones. For > MIPS-based TURBOchannel systems, the mapping between the addresses is > quite straightforward, but it's not necessarily the case for the others. > The addresses should also be used for I/O resource allocation mamagement > which is not implemented in the driver. > > Your point about dev->mem_end is of course valid -- the bug wasn't > noticed, because the variable isn't used for anything in these cases. > > Please send me the full oops report and I'll see what I can decipher from > it. Here is what I got on the console: Instruction bus error, epc == 80045ae0, ra == 8005d8a8 Oops in traps.c::do_be, line 491: $0 : 00000000 80280000 80280000 000f48b0 8027ef94 000f48b0 8027ef94 00000000 $8 : 8023e108 bc040000 00000020 874d227c 86b857e4 86b857e8 86b857e0 00000008 $16: 00000000 00000000 00000000 8027ebc0 80259820 fffffffe 00000000 04102060 $24: 00000000 2b107a90 8023e000 8023fde0 8043ff80 8005d8a8 Hi : 00000000 Lo : 00000600 epc : 80045ae0 Not tainted Status: 1000e400 Cause : 00000018 Process swapper (pid: 0, stackpage=8023e000) Stack: 00000000 8005da74 00000000 04102060 00000001 8005dd18 8027ebe0 20000001 8027ebe0 8005de64 8027ebe0 800598e4 00000000 811b2940 8023fea8 801263bc 800596a0 00000000 80158898 80158888 000000c0 80259848 00000000 80259838 fffffffe 1000e400 8023fea8 00000000 80059170 30000400 fffffffb 00000011 8004a6e8 8667c000 802590d0 8026c97c fffffffb 0000000d 8004a728 8044e2d0 ... Call Trace: [<8005da74>] [<8005dd18>] [<8005de64>] [<800598e4>] [<801263bc>] [<800596a0>] [<80158898>] [<80158888>] [<80059170>] [<8004a6e8>] [<8004a728>] [<80125574>] [<80125574>] [<800432dc>] [<800432c0>] [<8020a37c>] [<8004042c>] [<8020959c>] Code: 03a02021 080115e0 00000000 <401a6000> 00000000 001ad0c0 07400003 03a0d821 3c1b802b Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing Through ksymoops it gets to be: Instruction bus error, epc == 80045ae0, ra == 8005d8a8 Oops in traps.c::do_be, line 491: $0 : 00000000 80280000 80280000 000f48b0 8027ef94 000f48b0 8027ef94 00000000 $8 : 8023e108 bc040000 00000020 874d227c 86b857e4 86b857e8 86b857e0 00000008 $16: 00000000 00000000 00000000 8027ebc0 80259820 fffffffe 00000000 04102060 $24: 00000000 2b107a90 8023e000 8023fde0 8043ff80 8005d8a8 Hi : 00000000 Lo : 00000600 epc : 80045ae0 Not tainted Using defaults from ksymoops -t elf32-tradlittlemips -a mips:3000 Status: 1000e400 Cause : 00000018 Process swapper (pid: 0, stackpage=8023e000) Stack: 00000000 8005da74 00000000 04102060 00000001 8005dd18 8027ebe0 20000001 8027ebe0 8005de64 8027ebe0 800598e4 00000000 811b2940 8023fea8 801263bc 800596a0 00000000 80158898 80158888 000000c0 80259848 00000000 80259838 fffffffe 1000e400 8023fea8 00000000 80059170 30000400 fffffffb 00000011 8004a6e8 8667c000 802590d0 8026c97c fffffffb 0000000d 8004a728 8044e2d0 ... Call Trace: [<8005da74>] [<8005dd18>] [<8005de64>] [<800598e4>] [<801263bc>] [<800596a0>] [<80158898>] [<80158888>] [<80059170>] [<8004a6e8>] [<8004a728>] [<80125574>] [<80125574>] [<800432dc>] [<800432c0>] [<8020a37c>] [<8004042c>] [<8020959c>] Code: 03a02021 080115e0 00000000 <401a6000> 00000000 001ad0c0 07400003 03a0d821 3c1b802b >>RA; 8005d8a8 <update_wall_time+18/7c> >>$1; 80280000 <uidhash_table+2c/40c> >>$2; 80280000 <uidhash_table+2c/40c> >>$4; 8027ef94 <xtime+4/8> >>$6; 8027ef94 <xtime+4/8> >>$8; 8023e108 <init_task_union+108/2000> >>$19; 8027ebc0 <irq_stat+0/20> >>$20; 80259820 <tasklet_hi_vec+0/10> >>$28; 8023e000 <init_task_union+0/2000> >>$29; 8023fde0 <init_task_union+1de0/2000> >>$31; 8005d8a8 <update_wall_time+18/7c> >>PC; 80045ae0 <handle_ibe+0/cc> <===== Trace; 8005da74 <update_process_times+34/11c> Trace; 8005dd18 <timer_bh+160/168> Trace; 8005de64 <do_timer+144/14c> Trace; 800598e4 <bh_action+60/d8> Trace; 801263bc <timer_interrupt+f8/1cc> Trace; 800596a0 <tasklet_hi_action+110/1a4> Trace; 80158898 <lance_interrupt+2b0/2d8> Trace; 80158888 <lance_interrupt+2a0/2d8> Trace; 80059170 <do_softirq+1a0/1a8> Trace; 8004a6e8 <do_IRQ+e4/12c> Trace; 8004a728 <do_IRQ+124/12c> Trace; 80125574 <handle_it+8/10> Trace; 80125574 <handle_it+8/10> Trace; 800432dc <cpu_idle+6c/74> Trace; 800432c0 <cpu_idle+50/74> Trace; 8020a37c <p.1+324/d38> Trace; 8004042c <init+0/194> Trace; 8020959c <genexcept_early+dc/9f0> Code; 80045ad4 <handle_ades_int+28/34> 00000000 <_PC>: Code; 80045ad4 <handle_ades_int+28/34> 0: 03a02021 move a0,sp Code; 80045ad8 <handle_ades_int+2c/34> 4: 080115e0 j 45780 <_PC+0x45780> Code; 80045adc <handle_ades_int+30/34> 8: 00000000 nop Code; 80045ae0 <handle_ibe+0/cc> <===== c: 401a6000 mfc0 k0,$12 <===== Code; 80045ae4 <handle_ibe+4/cc> 10: 00000000 nop Code; 80045ae8 <handle_ibe+8/cc> 14: 001ad0c0 sll k0,k0,0x3 Code; 80045aec <handle_ibe+c/cc> 18: 07400003 bltz k0,28 <_PC+0x28> Code; 80045af0 <handle_ibe+10/cc> 1c: 03a0d821 move k1,sp Code; 80045af4 <handle_ibe+14/cc> 20: 3c1b802b lui k1,0x802b Kernel panic: Aiee, killing interrupt handler! Does that help anything? If not I may just have to assume it was a fluke that the machine crashed twice in 15 minutes after putting in the PMAD-AA card. I haven't managed to make it crash today. Len Sorensen