Re: SGI Octane && Bridge DMA bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/28/2016 08:01, Joshua Kinard wrote:
> Trying to tackle the bug on SGI Octane systems where the machine misbehaves if
> the amount of installed RAM is >2GB.  Reading some hints from the OpenBSD
> xbridge.c driver, it seems Octane's (and maybe IP27's?) Bridge IOMMU is weird
> in that, it cannot translate DMA addresses that go over 0x7fffffff (1ULL <<
> 31).  Which is complicated by the fact that Octane's physical memory is offset
> by 512MB, so I think the real DMA limits need to be 0x20000000 to 0x9fffffff.
> 
> Been messing around in the dma-coherence.h header for Octane, and so far, with
> 4GB of RAM installed, it gets all the way down to bringing up the MD raid
> stuff, then throws an instruction bus error for address 0xffffffffa0013ea0.  I
> can't make a determination if that's a DMA address or something else.  It's
> sign-extended, so it's not any valid 64-bit address (including Crosstalk or
> something attached to HEART).  It's very consistent, though, as it's in the EPC
> register after each crash.
> 
> The problem with Linux's DMA code is it is basically rigged to handle DMA for
> PCI devices.  This includes the MIPS-specific DMA stuff.  The Impact video
> board in an Octane is not a PCI device, but rather a pure Crosstalk device, and
> it has no issues with DMA (as far as I know).  So I need to find a way to limit
> DMA addresses for the Bridge driver only, but not mangle Impact DMA addresses.
> 
> Ideas?

I think the 0xffffffffa0013ea0 address I keep hitting from multiple, unrelated
*alloc*() functions is, by virtue of being in CKSEG1 space, an exception
handler.  Or was.  Seems like those are getting blown away somehow when
something triggers an Oops -- seems the disk layer (MD, XFS, or qla1280), doing
a DMA function and probably (though not confirmed) running into that Bridge
issue of limited DMA addressing.

Cause it seems that when the Oops happens, the MIPS trap code dumps the stack
and registers, but when it goes to print the code trace, that trips up an
instruction bus error on 0xffffffffa0013ea0, followed by one or more data bus
errors.

Seems to be the only explanation that I can think of.  Is it likely I'll have
to write Octane-specific DMA alloc functions instead of the default-dma.c
versions?  It seems dma-coherence.h is for dealing with addresses that have
already been allocated, when I think I'll have to intercept the DMA calls and
make sure nothing over 0x7fffffff in physmem for Bridge gets allocated.

-- 
Joshua Kinard
Gentoo/MIPS
kumba@xxxxxxxxxx
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux