Re: radeon ring 0 test failed on arm64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 17, 2022 at 9:17 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
>
> On 2022-03-17 12:26, Peter Geis wrote:
> > On Thu, Mar 17, 2022 at 6:37 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
> >>
> >> On 2022-03-17 00:14, Peter Geis wrote:
> >>> Good Evening,

I've added the Designware driver maintainers, since the Rockchip host
driver uses the dwc driver.

> >>>
> >>> I apologize for raising this email chain from the dead, but there have
> >>> been some developments that have introduced even more questions.
> >>> I've looped the Rockchip mailing list into this too, as this affects
> >>> rk356x, and likely the upcoming rk3588 if [1] is to be believed.
> >>>
> >>> TLDR for those not familiar: It seems the rk356x series (and possibly
> >>> the rk3588) were built without any outer coherent cache.
> >>> This means (unless Rockchip wants to clarify here) devices such as the
> >>> ITS and PCIe cannot utilize cache snooping.
> >>> This is based on the results of the email chain [2].
> >>>
> >>> The new circumstances are as follows:
> >>> The RPi CM4 Adventure Team as I've taken to calling them has been
> >>> attempting to get a dGPU working with the very broken Broadcom
> >>> controller in the RPi CM4.
> >>> Recently they acquired a SoQuartz rk3566 module which is pin
> >>> compatible with the CM4, and have taken to trying it out as well.
> >>>
> >>> This is how I got involved.
> >>> It seems they found a trivial way to force the Radeon R600 driver to
> >>> use Non-Cached memory for everything.
> >>> This single line change, combined with using memset_io instead of
> >>> memset, allows the ring tests to pass and the card probes successfully
> >>> (minus the DMA limitations of the rk356x due to the 32 bit
> >>> interconnect).
> >>> I discovered using this method that we start having unaligned io
> >>> memory access faults (bus errors) when running glmark2-drm (running
> >>> glmark2 directly was impossible, as both X and Wayland crashed too
> >>> early).
> >>> I traced this to using what I thought at the time was an unsafe memcpy
> >>> in the mesa stack.
> >>> Rewriting this function to force aligned writes solved the problem and
> >>> allows glmark2-drm to run to completion.
> >>> With some extensive debugging, I found about half a dozen memcpy
> >>> functions in mesa that if forced to be aligned would allow Wayland to
> >>> start, but with hilarious display corruption (see [3]. [4]).
> >>> The CM4 team is convinced this is an issue with memcpy in glibc, but
> >>> I'm not convinced it's that simple.
> >>>
> >>> On my two hour drive in to work this morning, I got to thinking.
> >>> If this was an memcpy fault, this would be universally broken on arm64
> >>> which is obviously not the case.
> >>> So I started thinking, what is different here than with systems known to work:
> >>> 1. No IOMMU for the PCIe controller.
> >>> 2. The Outer Cache Issue.
> >>>
> >>> Robin:
> >>> My questions for you, since you're the smartest person I know about
> >>> arm64 memory management:
> >>> Could cache snooping permit unaligned accesses to IO to be safe?
> >>
> >> No.
> >>
> >>> Or
> >>> Is it the lack of an IOMMU that's causing the alignment faults to become fatal?
> >>
> >> No.
> >>
> >>> Or
> >>> Am I insane here?
> >>
> >> No. (probably)
> >>
> >> CPU access to PCIe has nothing to do with PCIe's access to memory. From
> >> what you've described, my guess is that a GPU BAR gets put in a
> >> non-prefetchable window, such that it ends up mapped as Device memory
> >> (whereas if it were prefetchable it would be Normal Non-Cacheable).
> >
> > Okay, this is perfect and I think you just put me on the right track
> > for identifying the exact issue. Thanks!
> >
> > I've sliced up the non-prefetchable window and given it a prefetchable window.
> > The 256MB BAR now resides in that window.
> > However I'm still getting bus errors, so it seems the prefetch isn't
> > actually happening.
>
> Note that "prefetchable" really just means "no side-effects on reads",
> i.e. we can map it with a Normal memory type that technically *allows*
> the CPU to make speculative accesses because they will not be harmful,
> but that's not to say the CPU will do so. Just that if it did, you
> wouldn't notice anyway.
>
> It's entirely possible that the PCIe IP itself doesn't like unaligned
> accesses, so changing the memory type just moves you from an alignment
> fault to an external abort.

Okay, I've tried setting up PL_COHERENCY_CONTROL_3_OFF, where AxCACHE
can be forced from auto to predefined for reads and writes.
As I understand it, the cache bit should permit characteristic
mismatch to be accepted and prefetch to be enabled, when combined with
the read/write bits.
It doesn't seem to make a difference however.
I got the idea to look for this from the Armada8K and Tegra drivers.

It would be nice to know if dGPUs work at all on *any* DWC based PCIe
controllers.
We could use those as a starting point to find out what's broken here.

>
> > The difference is now the GPU realizes that an error has happened and
> > initiates recovery, vice before where it seemed to be clueless.
> > If I understand everything correctly, that's because before the bus
> > error was raised by the CPU due to the memory flag, vice now where
> > it's actually the bus raising the alarm.
> >
> > My next question, is this something the driver should set and isn't,
> > or is it just because of the broken cache coherency?
>
> The general rule for userspace mmap()ing PCIe-attached memory and
> handing it off to glibc or anyone else who might assume it's regular
> system RAM is "don't do that". If it's not access size or alignment that
> falls over, it could be atomic operations, MTE tags, or any other
> new-fangled memory innovation. For the ultimate dream of just plugging
> in a card full of RAM, you either need to look back to ISA or forward to
> CXL ;)

So either go back to the really old way of doing things, find and fix
the underlying problem, or wait for the IP to catch up?

>
> Robin.

Thanks!
Peter



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux