> > > This is based on the results of the email chain [2]. > > > > The new circumstances are as follows: > > The RPi CM4 Adventure Team as I've taken to calling them has been > > attempting to get a dGPU working with the very broken Broadcom > > controller in the RPi CM4. > > Recently they acquired a SoQuartz rk3566 module which is pin > > compatible with the CM4, and have taken to trying it out as well. > > > > This is how I got involved. > > It seems they found a trivial way to force the Radeon R600 driver to > > use Non-Cached memory for everything. > > Yeah, you basically just force it into AGP mode :) > > There is just absolutely no guarantee that this works reliable. Ah, that makes sense. > > > This single line change, combined with using memset_io instead of > > memset, allows the ring tests to pass and the card probes successfully > > (minus the DMA limitations of the rk356x due to the 32 bit > > interconnect). > > I discovered using this method that we start having unaligned io > > memory access faults (bus errors) when running glmark2-drm (running > > glmark2 directly was impossible, as both X and Wayland crashed too > > early). > > I traced this to using what I thought at the time was an unsafe memcpy > > in the mesa stack. > > Rewriting this function to force aligned writes solved the problem and > > allows glmark2-drm to run to completion. > > With some extensive debugging, I found about half a dozen memcpy > > functions in mesa that if forced to be aligned would allow Wayland to > > start, but with hilarious display corruption (see [3]. [4]). > > The CM4 team is convinced this is an issue with memcpy in glibc, but > > I'm not convinced it's that simple. > > Yes exactly that. > > Both OpenGL and Vulkan allow the application to mmap() device memory and > do any memory access they want with that. > > This means that changing memcpy is just a futile effort, it's still > possible for the application to make an unaligned memory access and that > is perfectly valid. I was afraid of that and it reflects what I see with X11's behavior. > > > On my two hour drive in to work this morning, I got to thinking. > > If this was an memcpy fault, this would be universally broken on arm64 > > which is obviously not the case. > > So I started thinking, what is different here than with systems known to work: > > 1. No IOMMU for the PCIe controller. > > 2. The Outer Cache Issue. > > Oh, very good point. I would be interested in that as answer as well. > > Regards, > Christian. > > > > > Robin: > > My questions for you, since you're the smartest person I know about > > arm64 memory management: > > Could cache snooping permit unaligned accesses to IO to be safe? > > Or > > Is it the lack of an IOMMU that's causing the alignment faults to become fatal? > > Or > > Am I insane here? > > > > Rockchip: > > Please update on the status for the Outer Cache errata for ITS services. > > Please provide an answer to the errata of the PCIe controller, in > > regard to cache snooping and buffering, for both the rk356x and the > > upcoming rk3588. > > > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffyCN%2Fmirrors%2Fcommit%2F0b985f29304dcb9d644174edacb67298e8049d4f&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZL3jA2VrnynWbUdFG6naaqrZqcnKRq338n%2Bj50DRa74%3D&reserved=0 > > [2] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F871rbdt4tu.wl-maz%40kernel.org%2FT%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QZy%2Bt%2Fus5f3yxwrHmXpzerXngPpKp3i9ZsF1UJ%2BHvlU%3D&reserved=0 > > [3] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.discordapp.com%2Fattachments%2F926487797844541510%2F953414755970850816%2Funknown.png&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=c29bc87hxyIvnsBK3Fo7FbF7RwJcFr%2FjgBrLIiBb%2FyY%3D&reserved=0 > > [4] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcdn.discordapp.com%2Fattachments%2F926487797844541510%2F953424952042852422%2Funknown.png&data=04%7C01%7Cchristian.koenig%40amd.com%7C4ae2dfa3e8ec4a765f8a08da07ab1cb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637830728762044450%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fwygTk%2BDzdla67rdAYb44vlivlby9lFwtcgjLfJEH4A%3D&reserved=0 > > > > Thank you everyone for your time. > > > > Very Respectfully, > > Peter Geis > > > > On Wed, May 26, 2021 at 7:21 AM Christian König > > <christian.koenig@xxxxxxx> wrote: > >> Hi Robin, > >> > >> Am 26.05.21 um 12:59 schrieb Robin Murphy: > >>> On 2021-05-26 10:42, Christian König wrote: > >>>> Hi Robin, > >>>> > >>>> Am 25.05.21 um 22:09 schrieb Robin Murphy: > >>>>> On 2021-05-25 14:05, Alex Deucher wrote: > >>>>>> On Tue, May 25, 2021 at 8:56 AM Peter Geis <pgwipeout@xxxxxxxxx> > >>>>>> wrote: > >>>>>>> On Tue, May 25, 2021 at 8:47 AM Alex Deucher > >>>>>>> <alexdeucher@xxxxxxxxx> wrote: > >>>>>>>> On Tue, May 25, 2021 at 8:42 AM Peter Geis <pgwipeout@xxxxxxxxx> > >>>>>>>> wrote: > >>>>>>>>> Good Evening, > >>>>>>>>> > >>>>>>>>> I am stress testing the pcie controller on the rk3566-quartz64 > >>>>>>>>> prototype SBC. > >>>>>>>>> This device has 1GB available at <0x3 0x00000000> for the PCIe > >>>>>>>>> controller, which makes a dGPU theoretically possible. > >>>>>>>>> While attempting to light off a HD7570 card I manage to get a > >>>>>>>>> modeset > >>>>>>>>> console, but ring0 test fails and disables acceleration. > >>>>>>>>> > >>>>>>>>> Note, we do not have UEFI, so all PCIe setup is from the Linux > >>>>>>>>> kernel. > >>>>>>>>> Any insight you can provide would be much appreciated. > >>>>>>>> Does your platform support PCIe cache coherency with the CPU? I.e., > >>>>>>>> does the CPU allow cache snoops from PCIe devices? That is required > >>>>>>>> for the driver to operate. > >>>>>>> Ah, most likely not. > >>>>>>> This issue has come up already as the GIC isn't permitted to snoop on > >>>>>>> the CPUs, so I doubt the PCIe controller can either. > >>>>>>> > >>>>>>> Is there no way to work around this or is it dead in the water? > >>>>>> It's required by the pcie spec. You could potentially work around it > >>>>>> if you can allocate uncached memory for DMA, but I don't think that is > >>>>>> possible currently. Ideally we'd figure out some way to detect if a > >>>>>> particular platform supports cache snooping or not as well. > >>>>> There's device_get_dma_attr(), although I don't think it will work > >>>>> currently for PCI devices without an OF or ACPI node - we could > >>>>> perhaps do with a PCI-specific wrapper which can walk up and defer > >>>>> to the host bridge's firmware description as necessary. > >>>>> > >>>>> The common DMA ops *do* correctly keep track of per-device coherency > >>>>> internally, but drivers aren't supposed to be poking at that > >>>>> information directly. > >>>> That sounds like you underestimate the problem. ARM has unfortunately > >>>> made the coherency for PCI an optional IP. > >>> Sorry to be that guy, but I'm involved a lot internally with our > >>> system IP and interconnect, and I probably understand the situation > >>> better than 99% of the community ;) > >> I need to apologize, didn't realized who was answering :) > >> > >> It just sounded to me that you wanted to suggest to the end user that > >> this is fixable in software and I really wanted to avoid even more > >> customers coming around asking how to do this. > >> > >>> For the record, the SBSA specification (the closet thing we have to a > >>> "system architecture") does require that PCIe is integrated in an > >>> I/O-coherent manner, but we don't have any control over what people do > >>> in embedded applications (note that we don't make PCIe IP at all, and > >>> there is plenty of 3rd-party interconnect IP). > >> So basically it is not the fault of the ARM IP-core, but people are just > >> stitching together PCIe interconnect IP with a core where it is not > >> supposed to be used with. > >> > >> Do I get that correctly? That's an interesting puzzle piece in the picture. > >> > >>>> So we are talking about a hardware limitation which potentially can't > >>>> be fixed without replacing the hardware. > >>> You expressed interest in "some way to detect if a particular platform > >>> supports cache snooping or not", by which I assumed you meant a > >>> software method for the amdgpu/radeon drivers to call, rather than, > >>> say, a website that driver maintainers can look up SoC names on. I'm > >>> saying that that API already exists (just may need a bit more work). > >>> Note that it is emphatically not a platform-level thing since > >>> coherency can and does vary per device within a system. > >> Well, I think this is not something an individual driver should mess > >> with. What the driver should do is just express that it needs coherent > >> access to all of system memory and if that is not possible fail to load > >> with a warning why it is not possible. > >> > >>> I wasn't suggesting that Linux could somehow make coherency magically > >>> work when the signals don't physically exist in the interconnect - I > >>> was assuming you'd merely want to do something like throw a big > >>> warning and taint the kernel to help triage bug reports. Some drivers > >>> like ahci_qoriq and panfrost simply need to know so they can program > >>> their device to emit the appropriate memory attributes either way, and > >>> rely on the DMA API to hide the rest of the difference, but if you > >>> want to treat non-coherent use as unsupported because it would require > >>> too invasive changes that's fine by me. > >> Yes exactly that please. I mean not sure how panfrost is doing it, but > >> at least the Vulkan userspace API specification requires devices to have > >> coherent access to system memory. > >> > >> So even if I would want to do this it is simply not possible because the > >> application doesn't tell the driver which memory is accessed by the > >> device and which by the CPU. > >> > >> Christian. > >> > >>> Robin. >