On 05/18/2015 01:39, Joshua Kinard wrote: > So I've gotten the second CPU in Octane to "tick" again...somehow. I am > certain someone's cat went missing in the process... So, yeah, the problem appears to be specific to the R14000 CPU module. I swapped in an R12K dual CPU module, and after a little bit of tinkering to revert a few hacks and clean up the code, it boots into SMP, mounts the userland, and has successfully sync'ed a Gentoo Portage tree w/o annihilating the XFS filesystem or the MD RAID5 array. Even compiled a few C files. # cat /proc/interrupts CPU0 CPU1 14: 0 0 HEART powerbtn 15: 0 0 HEART acfail 16: 0 44887 HEART qla1280 17: 0 16904 HEART qla1280 18: 1853 0 HEART ioc3-eth 20: 243 0 HEART ioc3-io 46: 348850 0 HEART cpu0-ipi 47: 0 315948 HEART cpu1-ipi 50: 1268 0 HEART heart_timer 71: 118453 195177 CPU timer # cat /proc/cpuinfo system type : SGI Octane machine : Unknown processor : 0 cpu model : R12000 V3.5 FPU V0.0 BogoMIPS : 600.47 byteorder : big endian wait instruction : no microsecond timers : yes tlb_entries : 64 extra interrupt vector : no hardware watchpoint : yes, count: 0, address/irw mask: [] isa : mips2 mips3 mips4 ASEs implemented : shadow register sets : 1 kscratch registers : 0 package : 0 core : 0 VCED exceptions : not available VCEI exceptions : not available processor : 1 cpu model : R12000 V3.5 FPU V0.0 BogoMIPS : 600.47 byteorder : big endian wait instruction : no microsecond timers : yes tlb_entries : 64 extra interrupt vector : no hardware watchpoint : yes, count: 0, address/irw mask: [] isa : mips2 mips3 mips4 ASEs implemented : shadow register sets : 1 kscratch registers : 0 package : 0 core : 0 VCED exceptions : not available VCEI exceptions : not available I even got the IRQs to be fanned out across both CPUs. Well, primarily the qla1280 drivers. They randomly hop between both CPUs, but no ill effects so far. But if I boot that *same* working kernel on an R14000 dual module, I get handed an IBE as soon as the userland mounts. The only documented differences that I can find on the R14000 is that it supports DDR memory, being able to do memory operations on the rising edge and falling edge of each clock. Not sure if that matters to the kernel at all, but I know of nothing else that describes the R14K's internals, such as if there's some new bit in CP0 config, branch-diagnostic, status, etc, that might explain why these IBE's are happening. Guess I need to hunt down my old dual R10K module next and verify that works fine... Also, is there a way to hardcode the cca=5 setting for IP30? Maybe it needs to be a hidden Kconfig item?. I tried setting cpu->writecombine in cpu-probe.c, but no dice there. If I boot an SMP kernel on dual R12K's w/o cca=5, I'll get one or two pretty-specific oopses. The one I did grab complains about bad spinlock magic in the core tty driver somewhere. I can transcribe that oops later on if interested. --J