On Wed, Mar 16, 2022 at 11:08 PM Kever Yang <kever.yang@xxxxxxxxxxxxxx> wrote: > > Hi Peter, > > On 2022/3/17 08:14, Peter Geis wrote: > > Good Evening, > > > > I apologize for raising this email chain from the dead, but there have > > been some developments that have introduced even more questions. > > I've looped the Rockchip mailing list into this too, as this affects > > rk356x, and likely the upcoming rk3588 if [1] is to be believed. > > > > TLDR for those not familiar: It seems the rk356x series (and possibly > > the rk3588) were built without any outer coherent cache. > > This means (unless Rockchip wants to clarify here) devices such as the > > ITS and PCIe cannot utilize cache snooping. > > This is based on the results of the email chain [2]. > > > > The new circumstances are as follows: > > The RPi CM4 Adventure Team as I've taken to calling them has been > > attempting to get a dGPU working with the very broken Broadcom > > controller in the RPi CM4. > > Recently they acquired a SoQuartz rk3566 module which is pin > > compatible with the CM4, and have taken to trying it out as well. > > > > This is how I got involved. > > It seems they found a trivial way to force the Radeon R600 driver to > > use Non-Cached memory for everything. > > This single line change, combined with using memset_io instead of > > memset, allows the ring tests to pass and the card probes successfully > > (minus the DMA limitations of the rk356x due to the 32 bit > > interconnect). > > I discovered using this method that we start having unaligned io > > memory access faults (bus errors) when running glmark2-drm (running > > glmark2 directly was impossible, as both X and Wayland crashed too > > early). > > I traced this to using what I thought at the time was an unsafe memcpy > > in the mesa stack. > > Rewriting this function to force aligned writes solved the problem and > > allows glmark2-drm to run to completion. > > With some extensive debugging, I found about half a dozen memcpy > > functions in mesa that if forced to be aligned would allow Wayland to > > start, but with hilarious display corruption (see [3]. [4]). > > The CM4 team is convinced this is an issue with memcpy in glibc, but > > I'm not convinced it's that simple. > > > > On my two hour drive in to work this morning, I got to thinking. > > If this was an memcpy fault, this would be universally broken on arm64 > > which is obviously not the case. > > So I started thinking, what is different here than with systems known to work: > > 1. No IOMMU for the PCIe controller. > > 2. The Outer Cache Issue. > > > > Robin: > > My questions for you, since you're the smartest person I know about > > arm64 memory management: > > Could cache snooping permit unaligned accesses to IO to be safe? > > Or > > Is it the lack of an IOMMU that's causing the alignment faults to become fatal? > > Or > > Am I insane here? > > > > Rockchip: > > Please update on the status for the Outer Cache errata for ITS services. > > Our SoC design team has double check with ARM GIC/ITS IP team for many > times, and the GITS_CBASER > of GIC600 IP does not support hardware bind or config to a fix value, so > they insist this is an IP > limitation instead of a SoC bug, software should take care of it :( > I will check again if we can provide errata for this issue. Thanks. This is necessary as the mbi-alias provides an imperfect implementation of the ITS and causes certain PCIe cards (eg x520 Intel 10G NIC) to misbehave. > > Please provide an answer to the errata of the PCIe controller, in > > regard to cache snooping and buffering, for both the rk356x and the > > upcoming rk3588. > > > Sorry, what is this? Part of the ITS bug is it expects to be cache coherent with the CPU cluster by design. Due to the rk356x being implemented without an outer accessible cache, the ITS and other devices that require cache coherency (PCIe for example) crash in fun ways. This means that rk356x cannot implement a specification compliant ITS or PCIe.