On 08/28/2016 08:01, Joshua Kinard wrote: > Trying to tackle the bug on SGI Octane systems where the machine misbehaves if > the amount of installed RAM is >2GB. Reading some hints from the OpenBSD > xbridge.c driver, it seems Octane's (and maybe IP27's?) Bridge IOMMU is weird > in that, it cannot translate DMA addresses that go over 0x7fffffff (1ULL << > 31). Which is complicated by the fact that Octane's physical memory is offset > by 512MB, so I think the real DMA limits need to be 0x20000000 to 0x9fffffff. > > Been messing around in the dma-coherence.h header for Octane, and so far, with > 4GB of RAM installed, it gets all the way down to bringing up the MD raid > stuff, then throws an instruction bus error for address 0xffffffffa0013ea0. I > can't make a determination if that's a DMA address or something else. It's > sign-extended, so it's not any valid 64-bit address (including Crosstalk or > something attached to HEART). It's very consistent, though, as it's in the EPC > register after each crash. > > The problem with Linux's DMA code is it is basically rigged to handle DMA for > PCI devices. This includes the MIPS-specific DMA stuff. The Impact video > board in an Octane is not a PCI device, but rather a pure Crosstalk device, and > it has no issues with DMA (as far as I know). So I need to find a way to limit > DMA addresses for the Bridge driver only, but not mangle Impact DMA addresses. > > Ideas? I think the 0xffffffffa0013ea0 address I keep hitting from multiple, unrelated *alloc*() functions is, by virtue of being in CKSEG1 space, an exception handler. Or was. Seems like those are getting blown away somehow when something triggers an Oops -- seems the disk layer (MD, XFS, or qla1280), doing a DMA function and probably (though not confirmed) running into that Bridge issue of limited DMA addressing. Cause it seems that when the Oops happens, the MIPS trap code dumps the stack and registers, but when it goes to print the code trace, that trips up an instruction bus error on 0xffffffffa0013ea0, followed by one or more data bus errors. Seems to be the only explanation that I can think of. Is it likely I'll have to write Octane-specific DMA alloc functions instead of the default-dma.c versions? It seems dma-coherence.h is for dealing with addresses that have already been allocated, when I think I'll have to intercept the DMA calls and make sure nothing over 0x7fffffff in physmem for Bridge gets allocated. -- Joshua Kinard Gentoo/MIPS kumba@xxxxxxxxxx 6144R/F5C6C943 2015-04-27 177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943 "The past tempts us, the present confuses us, the future frightens us. And our lives slip away, moment by moment, lost in that vast, terrible in-between." --Emperor Turhan, Centauri Republic