On Mon, Oct 06, 2003 at 07:05:06PM -0700, Steve Scott wrote: > We tried the fault.c patch Jun suggested, but it didn't solve the problem we were > having with the BUG() in schedule(). The patch at the beginning of > except_vec3_generic for the Vr5432 bug had previously been installed. > > While chasing the BUG() in schedule(), though, we ran across another BUG() in > alloc_skb() in ...linux/net/core/skbuff.c. : > > alloc_skb called nonatomically from interrupt 80117acc > kernel BUG at skbuff.c:179! > > We changed the way sock_init_data initializes the 'allocation' field and > were able to get past this one (see attached sock.c.patch). We're not sure > if this fix needs to be permanent, or if it's just a temporary workaround. > > For the schedule() BUG(), all evidence that we collected pointed to some > interrupt causing us to reenter schedule() (i.e., somehow schedule() was > called during an interrupt handler). We suspected something being run > from the timer interrupt bottom half, but were never able to prove it. We > also thought a remote possibility might be a pipeline hazard in the MIPS > causing the EPC register not to update on a nested exception, but NEC says > that can't happen on the Vr5432 that we're using... Can't happen on any MIPS. > We finally worked around the schedule BUG() by disabling interrupts > during the context switch in schedule(). This workaround required changes > in linux/kernel/sched.c and linux/arch/mips/kernel/r4k_switch.S (see attached > patches). Ouch. Forgive but if I'd not already ignore these patches for being ed-style I'd ignore them for being completly broken - these patches are harmful for performance and probably not going to achieve stability by anything other than luck ... Ralf