Am 13.05.20 um 09:19 schrieb Daniel Vetter:
On Tue, May 12, 2020 at 8:22 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
On Tue, May 12, 2020 at 12:38 PM Daniel Vetter <daniel@xxxxxxxx> wrote:
On Tue, May 12, 2020 at 3:22 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
On Tue, May 12, 2020 at 5:40 AM Karoly Balogh (Charlie/SGR)
<charlie@xxxxxxxxxxxxxxxx> wrote:
Hi,
On Tue, 12 May 2020, Rui Salvaterra wrote:
FWIW, on my last-generation PowerBook with RV350 (IIRC), there was a
big performance difference between AGP and PCI GART. The latter was
sort of usable for normal desktop operation, but not so much for
OpenGL apps (which were usable with AGP).
I never really understood what were the issues with AGP on PowerPC
(well, Apple, the only ones I've tested) machines. I mean, did OS X also
disable AGP entirely, or did it have workarounds somewhere else on the
stack nobody was able to figure out?
I don't know about OS X, but I doubt there is a major/blocker hardware
issue, at least not one which affects every AGP machine.
MorphOS' own Radeon driver uses the AGP facilities to some degree on all
AGP PowerPC Macs supported by that OS, which is from PMac AGP Graphics
(3,1) all the way up to the AGP G5 (7,3), including the various portables
and the Mac mini G4. For example it can utilize it to stream video data
directly from mainboard RAM, so you don't have to copy it with the CPU,
allowing reasonably good 720p h264 video playback on most systems above
the 1Ghz mark with the native MPlayer port. I'm sure the 3D part of the
driver also use it to some degree, given the performance improvement we
experienced when the AGP support was enabled (initially the system was
running without it), but to which extent I can't say.
The problem is AGP doesn't support CPU cache snooping. Technically
PCI must support coherent device access to system memory. Unsnooped
access is an optional feature and some platforms may not support it at
all. Unfortunately, AGP required unsnooped access. x8t generally
provides a way to do this, but other platforms, not so much. I don't
recall to what extent PowerPC supported this. The Linux DMA API
doesn't really have a way to get uncached memory for DMA so there is
that too. Windows and Mac may provide a way to do this depending on
the platforms. What probably should have been done on AGP boards was
to use both the AGP GART and the device GART. The former for uncached
memory (if the platform supported it) and the latter for cached
memory. That never happened.
Slight correction on the dma-api side of things: The dma-api very much
can give you uncached memory, but only on some platforms, and the
dma-api is very opinionated about which those are. And it refuses to
tell you whether your memory ends up being uncached or cached. That's
all done in the name of platform portability, which is good for most
drivers, but just too much pain for gpu drivers.
Out of curiosity how do you do that without manually messing around
with PAT or MTRRs?
i915 is even worse, we manually mess around with clflush. In
userspace. So really there's 2 axis for dma memory: coherent vs.
non-coherent (which is something the dma-api somewhat exposed), i.e.
do you need to clflush or not, and cached vs uncached, i.e. are the
PAT entries wc or wb.
But yeah if you dont have PAT real uncached isn't possible, can't frob
MTTR for individual pages. That also, to my understanding, why the dma
api doesn't want to expose this to driver, but abstract it all away:
On many tiny soc platforms all you have for uncached is an mttr (well
the equivalent on that platform), so you anything you get from
dma_alloc_coherent needs to come from there.
IIRC I once got it explained like this: On some platforms all you have a
register with a value and if your address is above that value it is
uncached and wc if it is below it is cached and wb.
But then no one is ever going to plug in a big gpu into such a system
and expect anything to work, so we really need an abstraction that
works on a bit more than just x86 (so we don't dig around in platform
stuff like updating PAT or issusing clflush anymore), but doesn't try
to work everywhere linux runs, just on the few platforms people expect
big gpus to work on. For all the kms-only drivers we have the dma api
seems actually perfectly fine (essentially the cma helpes we have
should be called dma helpers, since that's what they're using
underneath for all buffer management).
That is unfortunately not true for AMD GPUs, people tend to put them
into those embedded ARM or PowerPC boxes and just expect them to work.
On the other hand we have hardware/firmware engineers which assumed you
always have USWC and we wonder for weeks why firmware loaded doesn't
work....
Regards,
Christian.
Cheers, Daniel
Alex
Otherwise all agree, agp is a mighty mess and essentially just
crapshot outside of x86. It kinda worked for the much more static
allocations for dri1, but with in-kernel memory managers all the cache
flushing issues showed up big time and it all fell to pieces. Plus a
lot of these host chipset back then where designed for the rather
static windows gpu managers, so even on x86 the coherency issues for
agp mode when used together with ttm or something else really dynamic
is pretty bad because the hw just doesn't really cope and has all
kinds of flushing troubles and races. I think the later agp chipsets
were better.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel