2016-08-28 9:58 GMT-07:00 Joshua Kinard <kumba@xxxxxxxxxx>: > On 08/28/2016 08:01, Joshua Kinard wrote: >> Trying to tackle the bug on SGI Octane systems where the machine misbehaves if >> the amount of installed RAM is >2GB. Reading some hints from the OpenBSD >> xbridge.c driver, it seems Octane's (and maybe IP27's?) Bridge IOMMU is weird >> in that, it cannot translate DMA addresses that go over 0x7fffffff (1ULL << >> 31). Which is complicated by the fact that Octane's physical memory is offset >> by 512MB, so I think the real DMA limits need to be 0x20000000 to 0x9fffffff. >> >> Been messing around in the dma-coherence.h header for Octane, and so far, with >> 4GB of RAM installed, it gets all the way down to bringing up the MD raid >> stuff, then throws an instruction bus error for address 0xffffffffa0013ea0. I >> can't make a determination if that's a DMA address or something else. It's >> sign-extended, so it's not any valid 64-bit address (including Crosstalk or >> something attached to HEART). It's very consistent, though, as it's in the EPC >> register after each crash. >> >> The problem with Linux's DMA code is it is basically rigged to handle DMA for >> PCI devices. This includes the MIPS-specific DMA stuff. The Impact video >> board in an Octane is not a PCI device, but rather a pure Crosstalk device, and >> it has no issues with DMA (as far as I know). So I need to find a way to limit >> DMA addresses for the Bridge driver only, but not mangle Impact DMA addresses. >> >> Ideas? > > I think the 0xffffffffa0013ea0 address I keep hitting from multiple, unrelated > *alloc*() functions is, by virtue of being in CKSEG1 space, an exception > handler. Or was. Seems like those are getting blown away somehow when > something triggers an Oops -- seems the disk layer (MD, XFS, or qla1280), doing > a DMA function and probably (though not confirmed) running into that Bridge > issue of limited DMA addressing. > > Cause it seems that when the Oops happens, the MIPS trap code dumps the stack > and registers, but when it goes to print the code trace, that trips up an > instruction bus error on 0xffffffffa0013ea0, followed by one or more data bus > errors. > > Seems to be the only explanation that I can think of. Is it likely I'll have > to write Octane-specific DMA alloc functions instead of the default-dma.c > versions? It seems dma-coherence.h is for dealing with addresses that have > already been allocated, when I think I'll have to intercept the DMA calls and > make sure nothing over 0x7fffffff in physmem for Bridge gets allocated. Regarding your first question, for all plat_dma_* operations you should be able to inspect the struct device properties and provide the correct implementation based on whether this device is a child of the Bridge IOMMU or not (e.g: looking at dev->parent.name for instance?) You are right that this only works for addresses that have already been allocated, if you need to make sure that the allocation falls under a particular range as well, which is not taken care of by dma-default.c, either setting an appropriate dma_mask, or providing a custom implementation for dma_ma_ops may be required here. HTH -- Florian