Alex Deucher <alexdeucher@xxxxxxxxx> writes: > On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau@xxxxxxx> wrote: >> >> On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote: >> > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote: >> > > I'm not familiar enough with ARM to know if write combining >> > > is actually an architectural limitation or if it's an issue >> > > with the PCIe IPs used on various platforms, but so far >> > > everyone that has tried to run radeon hardware on >> > > ARM has had to disable it. So let's just make it official. >> > >> > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing >> > mappings and stuff, so you need to allocate your wc memory from special >> > pools. So probably best to just disable it until we figure this out. >> >> I believe both of you are conflating different issues under the wrong >> name. Write combining happens all the time with Arm, the ARMv8 >> architecture is a weakly-ordered model of memory so hardware is allowed >> to re-order or combine memory access as they seem fit. >> >> A while ago I did run an AMD GPU card on my Juno dev board and it worked >> (for a very limited definition of worked, I've only validated the fact >> that I could get an fbcon and could run un-accelerated X11). So I would >> be interested if Alex could share some of the scenarios where people are >> seeing failures. > > Here's an example: > https://bugs.freedesktop.org/show_bug.cgi?id=108625 > But there are probably 5 or 6 other cases where people have emailed me > or our team directly with issues on ARM resolved by disabling WC. > Generally the driver seems to load ok, but then hangs as soon as you > try and use acceleration from userspace or we end up with page > flipping timeouts. Not really sure what the issue is. Michel > suggested maybe ARM has a cacheable kernel mapping of all "normal" > system memory, and having > both that mapping and another non-cacheable mapping of the same page > can result in bad behaviour. > >> >> As for aliasing, yeah, having multiple aliases to the same piece of >> memory is a bad thing. The problem arises when devices on the PCI bus >> have memory allocated as device memory (which on Arm is non-cacheable >> and non-reorderable), but the PCI bus effectively acts as a write-combiner >> which changes the order of transactions. Therefore, for devices that >> have local memory associated with them (i.e. more than just register >> accesses) one should allocate memory in the first place that is >> Device-GRE (gathering, reordering and early-access). Otherwise, problems >> will surface that are not visible on x86 as that is a strongly ordered >> architecture. > > PCI framebuffer BARs are mapped on the CPU with WC. We also use > uncached WC mappings for system memory in cases where it's not likely > we will be doing any CPU reads. When accessing system memory, the GPU > can either do a CPU cache snooped transaction or a non-snooped > transaction. The non-snooped transaction has lower latency and better > throughput since it doesn't have to snoop the CPU cache. > >> >> > >> > > Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> >> > >> > Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> >> >> Given that this API is only used by AMD I'm OK for now with the change, >> but I think in general it is misleading and we should work towards >> fixing radeon and amd drivers. > > Alternatively, we could just disable WC in the amdgpu driver on ARM. > I'm not sure to what extent other drivers are using WC in general or > have been tested on ARM. FWIW, I use WC mappings of BOs on V3D (shmem) and VC4 (cma). V3D is totally stable. VC4 I've heard reports of stability issues long-term but I don't think it's related. I don't do any cached mappings of my BOs, though.
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx