Re: SGI Octane && Bridge DMA bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2016-08-28 9:58 GMT-07:00 Joshua Kinard <kumba@xxxxxxxxxx>:
> On 08/28/2016 08:01, Joshua Kinard wrote:
>> Trying to tackle the bug on SGI Octane systems where the machine misbehaves if
>> the amount of installed RAM is >2GB.  Reading some hints from the OpenBSD
>> xbridge.c driver, it seems Octane's (and maybe IP27's?) Bridge IOMMU is weird
>> in that, it cannot translate DMA addresses that go over 0x7fffffff (1ULL <<
>> 31).  Which is complicated by the fact that Octane's physical memory is offset
>> by 512MB, so I think the real DMA limits need to be 0x20000000 to 0x9fffffff.
>>
>> Been messing around in the dma-coherence.h header for Octane, and so far, with
>> 4GB of RAM installed, it gets all the way down to bringing up the MD raid
>> stuff, then throws an instruction bus error for address 0xffffffffa0013ea0.  I
>> can't make a determination if that's a DMA address or something else.  It's
>> sign-extended, so it's not any valid 64-bit address (including Crosstalk or
>> something attached to HEART).  It's very consistent, though, as it's in the EPC
>> register after each crash.
>>
>> The problem with Linux's DMA code is it is basically rigged to handle DMA for
>> PCI devices.  This includes the MIPS-specific DMA stuff.  The Impact video
>> board in an Octane is not a PCI device, but rather a pure Crosstalk device, and
>> it has no issues with DMA (as far as I know).  So I need to find a way to limit
>> DMA addresses for the Bridge driver only, but not mangle Impact DMA addresses.
>>
>> Ideas?
>
> I think the 0xffffffffa0013ea0 address I keep hitting from multiple, unrelated
> *alloc*() functions is, by virtue of being in CKSEG1 space, an exception
> handler.  Or was.  Seems like those are getting blown away somehow when
> something triggers an Oops -- seems the disk layer (MD, XFS, or qla1280), doing
> a DMA function and probably (though not confirmed) running into that Bridge
> issue of limited DMA addressing.
>
> Cause it seems that when the Oops happens, the MIPS trap code dumps the stack
> and registers, but when it goes to print the code trace, that trips up an
> instruction bus error on 0xffffffffa0013ea0, followed by one or more data bus
> errors.
>
> Seems to be the only explanation that I can think of.  Is it likely I'll have
> to write Octane-specific DMA alloc functions instead of the default-dma.c
> versions?  It seems dma-coherence.h is for dealing with addresses that have
> already been allocated, when I think I'll have to intercept the DMA calls and
> make sure nothing over 0x7fffffff in physmem for Bridge gets allocated.

Regarding your first question, for all plat_dma_* operations you
should be able to inspect the struct device properties and provide the
correct implementation based on whether this device is a child of the
Bridge IOMMU or not (e.g: looking at dev->parent.name for instance?)

You are right that this only works for addresses that have already
been allocated, if you need to make sure that the allocation falls
under a particular range as well, which is not taken care of by
dma-default.c, either setting an appropriate dma_mask, or providing a
custom implementation for dma_ma_ops may be required here.

HTH
-- 
Florian




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux