Re: [PATCH 0/6] Radeon memory management improvements

Jerome Glisse <j.glisse@xxxxxxxxx> · Wed, 26 Feb 2014 20:17:16 -0500

On Mon, Feb 24, 2014 at 08:39:07PM +0100, Marek Olšák wrote:
> On Mon, Feb 24, 2014 at 5:40 PM, Christian König
> <deathsimple@xxxxxxxxxxx> wrote:
> > Am 24.02.2014 16:20, schrieb Marek Olšák:
> >> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
> >> which not only solves fragmentation issues, but it also allows each buffer
> >> to be partially in VRAM and partially in GTT, which becomes more important
> >> with large buffers like 100 MB. Moving whole buffers back and forth between
> >> VRAM and GTT is inefficient if you can do it at page granularity. Also, due
> >> to fragmentation, we can never really use all of VRAM, but only about
> >> 90-95%.
> >
> >
> > Yeah, I'm also thinking about this for quite some time now. The basic
> > problem is that while our GPUs support VM they don't support faulting pages
> > in and continuing (at least nobody got that working reliable so far). E.g.
> > when you hit a page fault you can't relocate the page and then continue.
> >
> > Support for partially resident textures on newer hardware currently works by
> > splitting the buffer up into smaller buffers in userspace and then actively
> > checking in the shader if we hit a buffer that's not currently in memory,
> > but that's not really applicable in the general use case (to much shader
> > overhead).
> 
> I was thinking of splitting buffers into smaller chunks and treating
> them like independent TTM buffers, i.e. one radeon_bo would contain an
> array of TTM buffers which would be validated independently of each
> other. The chunks would only be mapped together to make them look like
> one buffer. This would be hidden from userspace and there would only
> be one GEM handle for the whole buffer, so that DRI2 sharing works.

This is a bad idea you will waste a lot of memory for all the ttm objects.
I think you should just decouple that from ttm. TTM placement would be a
hint ie if ttm placement is VRAM than radeon code should try to put as
much as possible of it in VRAM.

radeon would manage chunk of VRAM with lightweight structure. Of course if
such thing is also usefull for nvidia then it would make sense to do that
in ttm.

For the scanout buffer this can be done when the buffer is bound to display
at which point a flag is set thus we would be backward compatible with old
userspace.

Cheers,
Jérôme

> 
> >
> >
> >> 2) Add support for uncached GTT. I think it should improve performance for
> >> dGPUs under memory pressure, but some testing needs to be done to confirm
> >> that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said
> >> to be working on some later chips.
> >
> >
> > Did you try to make the whole GTT uncached or just evicted BOs? Making the
> > whole GTT uncached probably won't work out of the box, but avoiding setting
> > the "SNOOPED" flag on those pages might get us better performance while
> > swapping them into VRAM again.
> 
> I made the whole GTT uncached.
> 
> Marek
> _______________________________________________
> dri-devel mailing list
> dri-devel@xxxxxxxxxxxxxxxxxxxxx
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel