Re: [PATCH 0/6] Radeon memory management improvements

Christian König <deathsimple@xxxxxxxxxxx> · Tue, 25 Feb 2014 11:11:52 +0100

Am 24.02.2014 20:39, schrieb Marek Olšák:
On Mon, Feb 24, 2014 at 5:40 PM, Christian König
<deathsimple@xxxxxxxxxxx> wrote:
Am 24.02.2014 16:20, schrieb Marek Olšák:
1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
which not only solves fragmentation issues, but it also allows each buffer
to be partially in VRAM and partially in GTT, which becomes more important
with large buffers like 100 MB. Moving whole buffers back and forth between
VRAM and GTT is inefficient if you can do it at page granularity. Also, due
to fragmentation, we can never really use all of VRAM, but only about
90-95%.

Yeah, I'm also thinking about this for quite some time now. The basic
problem is that while our GPUs support VM they don't support faulting pages
in and continuing (at least nobody got that working reliable so far). E.g.
when you hit a page fault you can't relocate the page and then continue.

Support for partially resident textures on newer hardware currently works by
splitting the buffer up into smaller buffers in userspace and then actively
checking in the shader if we hit a buffer that's not currently in memory,
but that's not really applicable in the general use case (to much shader
overhead).
I was thinking of splitting buffers into smaller chunks and treating
them like independent TTM buffers, i.e. one radeon_bo would contain an
array of TTM buffers which would be validated independently of each
other. The chunks would only be mapped together to make them look like
one buffer. This would be hidden from userspace and there would only
be one GEM handle for the whole buffer, so that DRI2 sharing works.

My thoughts where more to to this on the userspace side, but doing it in 
the kernel indeed avoids a bunch of problems with sharing the buffer.

Sounds like a plan to me. The only thing I can currently see missing is 
handling of scanout buffers. Do we have a flag for this while creating 
the buffer?

2) Add support for uncached GTT. I think it should improve performance for
dGPUs under memory pressure, but some testing needs to be done to confirm
that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said
to be working on some later chips.

Did you try to make the whole GTT uncached or just evicted BOs? Making the
whole GTT uncached probably won't work out of the box, but avoiding setting
the "SNOOPED" flag on those pages might get us better performance while
swapping them into VRAM again.
I made the whole GTT uncached.

I'm not sure if this will work, and at least for the ring buffers it's 
probably also not a good idea.

Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel