On 03/16/2017 10:09, Ralf Baechle wrote: > On Wed, Mar 15, 2017 at 11:50:44PM -0400, Joshua Kinard wrote: > >> On 03/15/2017 16:11, Joshua Kinard wrote: >>> I've reported in the past that turning on CONFIG_DEBUG_LOCK_ALLOC produces a >>> kernel that can't boot on several SGI platforms. It turns out that using >>> arcload (Stan's bootloader originally written for IP30), I can get some >>> debugging out on why. I am still puzzled, but maybe this information can be >>> interpreted by someone else into something meaningful? >>> >>> All addresses printed out of arcload are physical address. >>> >>> ARCS Memory Map as printed by some debugging I added to the arcload binary: >>> >>> 0x00000000 - 0x00001000 ExceptionBlock >>> 0x00001000 - 0x00002000 SystemParameterBlock >>> 0x00002000 - 0x00004000 FirmwarePermanent >>> 0x20004000 - 0x20f00000 FreeMemory*** >>> 0x20f00000 - 0x21000000 FirmwareTemporary >>> 0x21000000 - 0x5fff0000 FreeMemory >>> 0x5fff0000 - 0x5ffff000 LoadedProgram >>> 0x5ffff000 - 0x60000000 FreeMemory >>> 0x60000000 - 0xa0000000 FirmwarePermanent >> >> So it turns out I can get away, on Octane at least, by changing the load >> address from 0x20004000 to an arbitrary value in the other FreeMemory segment >> from 0x21000000 - 0x5fff0000. Specifically, using 0x21004000 appears to work >> without any ill effects. >> >> The 0x20004000 value is the address used by IRIX to load (with symon, it >> becomes 0x200800000 instead). I'll have to try this on the IP27 later on as >> well. On Octane, CONFIG_DEBUG_LOCK_ALLOC didn't toss up any major locking >> issues yet. Probably need to hammer the disks with bonnie++ or such. At least >> I can get back to the BRIDGE/PCI mess now... > > I'm wondering where the ARC stack is on kernel entry if maybe the > ARC stack has corrupted the kernel? If possible, can you get your > kernel or a test program to compute a checksum over itself to see > if it has been corrupted? As far as I can tell, it really does seem that it is a sizing issue. I don't have the time to dive into what CONFIG_DEBUG_LOCK_ALLOC is exactly doing, but I found one hit on LKML (lost the URL) that indicates it fluffs up a particular struct that is very common and so introduces a fair bit of bloat, and it seems possible that the 0x20004000-0x20f00000 really is too small. I wouldn't rule out the possibility that SGI designed ARCS on the Octane to allow only IRIX to load at this particular address and Linux has just gotten lucky thus far. As for whether loading at the next FreeMemory segment in 0x21000000-0x5fff0000 smashes any ARCS segments, that I am not sure about. A kernel booting in that segment does boot, and seems to behave no differently than a kernel booting in the other segment, including exhibiting the same bugs. Like IP27, Octane doesn't have a need for ARCS after the kernel boots, as resetting the system can be done by flipping a bit in HEART, and power down is handled by the RTC driver (this feature broke, though, and I haven't chased down why yet). So if we're clobbering ARCS using this load address...well, it can't be all that bad </famous-last-words> I'll see what IP27 does, assuming it even has a large enough FreeMemory segment to work with. > Let me repeat my ARC(S) mantra again, ARC(S) is broken, ARC(S) lies. > Trust is futile. Even if ARC(S) claims something is free I'd rather > not rely on it. Apparently, and only on Octane, ARCS detects and maps out only the first 1GB of RAM. All remaining RAM installed in the system is marked as FirmwarePermanent and mapped into 0x60000000 on up. -- Joshua Kinard Gentoo/MIPS kumba@xxxxxxxxxx 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic