On 05/23/2015 23:17, Joshua Kinard wrote: > On 05/18/2015 01:39, Joshua Kinard wrote: >> So I've gotten the second CPU in Octane to "tick" again...somehow. I am >> certain someone's cat went missing in the process... > > So, yeah, the problem appears to be specific to the R14000 CPU module. I > swapped in an R12K dual CPU module, and after a little bit of tinkering to > revert a few hacks and clean up the code, it boots into SMP, mounts the > userland, and has successfully sync'ed a Gentoo Portage tree w/o annihilating > the XFS filesystem or the MD RAID5 array. Even compiled a few C files. > [snip] > > I even got the IRQs to be fanned out across both CPUs. Well, primarily the > qla1280 drivers. They randomly hop between both CPUs, but no ill effects so far. > > But if I boot that *same* working kernel on an R14000 dual module, I get handed > an IBE as soon as the userland mounts. The only documented differences that I > can find on the R14000 is that it supports DDR memory, being able to do memory > operations on the rising edge and falling edge of each clock. Not sure if that > matters to the kernel at all, but I know of nothing else that describes the > R14K's internals, such as if there's some new bit in CP0 config, > branch-diagnostic, status, etc, that might explain why these IBE's are happening. > > Guess I need to hunt down my old dual R10K module next and verify that works > fine... > > Also, is there a way to hardcode the cca=5 setting for IP30? Maybe it needs to > be a hidden Kconfig item?. I tried setting cpu->writecombine in cpu-probe.c, > but no dice there. If I boot an SMP kernel on dual R12K's w/o cca=5, I'll get > one or two pretty-specific oopses. The one I did grab complains about bad > spinlock magic in the core tty driver somewhere. I can transcribe that oops > later on if interested. So far, the problem looks to have been blindly assigning all 64 HEART IRQs to 'handle_level_irq', including the SMP IPI IRQs. I fixed that by assigning the four IPI IRQs and four unused debug IRQs to 'handle_percpu_irq'. So far, no bus errors, even on R14000. Also successfully tested 16KB PAGE_SIZE and no bus errors. Next, 64KB PAGE_SIZE w/ CONFIG_TRANSPARENT_HUGEPAGE, which was pretty good at triggering bus errors. </jinx> --J