Geert, I had exactly the same problem with Vr4120A chip! I have narrowed it down to be a hardware bug. Basically under certain conditions, BD flag won't get set. You can verify that by inserting various number of "nop" just before the faulting places and observe certain address alignment would show/hide this bug. Further more I wrote a standalone kernel code and could not reproduce it, which means it requires more conditions than just address alignment. NEC Europe knows about this problem. Not sure if they passed it to Japan where the chip is designed. Their engineers even had difficulty to understand what I was talking about. (more sighs) Jun On Mon, Jan 13, 2003 at 05:13:17PM +0100, Geert Uytterhoeven wrote: > > I'm seeing a crash in 2.4.20 in emulate_load_store_insn(), when accepting a TCP > connection (exact line number influenced by debug code): > > Unhandled kernel unaligned access or invalid instruction in unaligned.c::emulate_load_store_insn, line 492: > $0 : 00000000 10008400 30000000 00000000 83c2a380 83d9f80e 838941c0 00000001 > $8 : 00000016 c0a80002 c0a80001 00000016 83f326a4 83f326a8 83f326a0 00000000 > $16: 83c2a43c 811af440 00000000 83c2a380 803da18c 00000000 00000000 00000000 > $24: 00000000 2ac41330 8039a000 8039baf8 a38415b4 8033eea4 > Hi : 00000000 > Lo : 00000140 > epc : 80346448 Not tainted > Status: 10008403 > Cause : 00000010 > Process swapper (pid: 0, stackpage=8039a000) > Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8039a000 > 00000001 810d0060 802dd370 00000000 00000000 8039bb70 00000000 8041a690 > 803d2000 810c5de0 8041a620 810d0060 802dd370 80213fa4 810c41a0 8039bbc8 > 8020ad50 ... > Call Trace: [<802dd370>] [<802dd370>] [<80213fa4>] [<8020ad50>] [<802ea344>] > [<802ea2fc>] [<80307e08>] [<802378f8>] [<8020a0d4>] [<8020a0d4>] [<802061d8>] > [<802061d8>] [<8020a0d4>] [<8033eea4>] [<80346fbc>] [<80347060>] [<8034716c>] > [<803476f4>] [<80329a50>] [<80326648>] [<8032952c>] [<80329ddc>] [<80329d98>] > [<80329ddc>] [<8031700c>] [<80329790>] [<8031700c>] [<80316bb4>] [<803172b8>] > [<802df95c>] [<8021bf30>] [<80317500>] [<80316ecc>] [<8021b810>] [<80379278>] > [<8020ad50>] [<8020aeb0>] [<8020ae84>] [<80379228>] [<80204250>] ... > > Code: 8cc30064 3c023000 00621824 <14600012> 8cb50010 8c840238 8c820004 90830000 00621007 > Kernel panic: Aiee, killing interrupt handler! > In interrupt handler - not syncing > > 803463f8 <tcp_v4_conn_request>: > 803463f8: 27bdfe20 addiu sp,sp,-480 > 803463fc: afb601d8 sw s6,472(sp) > 80346400: afb301cc sw s3,460(sp) > 80346404: afb101c4 sw s1,452(sp) > 80346408: afbf01dc sw ra,476(sp) > 8034640c: afb501d4 sw s5,468(sp) > 80346410: afb401d0 sw s4,464(sp) > 80346414: afb201c8 sw s2,456(sp) > 80346418: afb001c0 sw s0,448(sp) > 8034641c: 00a08821 move s1,a1 > 80346420: 8ca50020 lw a1,32(a1) > 80346424: 8e260028 lw a2,40(s1) > 80346428: 8e320044 lw s2,68(s1) > 8034642c: 8ca2000c lw v0,12(a1) > 80346430: 00809821 move s3,a0 > 80346434: 0000b021 move s6,zero > 80346438: afa201b8 sw v0,440(sp) > 8034643c: 8cc30064 lw v1,100(a2) > 80346440: 3c023000 lui v0,0x3000 > 80346444: 00621824 and v1,v1,v0 > 80346448: 14600012 bnez v1,80346494 <tcp_v4_conn_request+0x9c> > 8034644c: 8cb50010 lw s5,16(a1) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 80346450: 8c840238 lw a0,568(a0) > 80346454: 8c820004 lw v0,4(a0) > 80346458: 90830000 lbu v1,0(a0) > 8034645c: 00621007 srav v0,v0,v1 > > If I print the parameters at label `sigill' in emulate_load_store_insn(), I > get: > > pc 0x80346448 addr 0x83d9f81e ins 0x14600012 > > And emulate_load_store_insn() gets confused because 0x14600012 is not a > load/store. 0x14600012 is the branch instruction before the load, not the load > after the branch instruction! Note that bit 31 of cause (CAUSEF_BD) is not set. > Some more investigations showed that the branch is indeed not taken. > > Apparently if an unaligned access happens right after a branch which is not > taking, epc points to the branch instruction, and CAUSEF_BD is not set > (technically speaking, this is not a branch delay, since the branch is not > taken :-). Is this expected behavior? The CPU is a VR4120A core. > > As a workaround, I assume I can just test whether pc points to a branch > instruction, and increment pc if that's the case? > > Thanks! > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > >