Re: [PATCH v3 3/8] drm/i915: Add a new "remapped" gtt_view

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 10 Oct 2018 08:04:19 +0100

On 09/10/2018 12:54, Ville Syrjälä wrote:
On Tue, Oct 09, 2018 at 09:24:33AM +0100, Tvrtko Ursulin wrote:

On 05/10/2018 19:42, Ville Syrjälä wrote:
On Mon, Oct 01, 2018 at 04:48:21PM +0100, Tvrtko Ursulin wrote:

On 01/10/2018 16:37, Chris Wilson wrote:
Quoting Ville Syrjälä (2018-10-01 16:27:43)
On Mon, Oct 01, 2018 at 04:12:09PM +0100, Chris Wilson wrote:
Quoting Ville Syrjälä (2018-10-01 16:03:30)
On Wed, Sep 26, 2018 at 08:50:25AM +0100, Tvrtko Ursulin wrote:

On 25/09/2018 20:37, Ville Syrjala wrote:
One more thing, do you really need random access for this
transformation? Or you could walk the sg list as it is? Just if you hit
a too long chunk you need to copy a trimmed version over and know where
to continue for the next row. If doable it would be better than having
to kvmalloc_array.

I think Chris suggested just using i915_gem_object_get_dma_address()
here. But I'm not sure why we're not using it for rotate_pages()
as well.

Tvrtko is opposed to populating the obj->mm.pages cache with no defined
release point. I say the mempressure and shrinker should to the right
thing, but it's a big if.

OK.

Well, looks to me like i915_gem_object_get_dma_address() is the
only convenient looking thing for iterating the pages without
arowning the code in irrelevant details about sgs and whatnot.
I suppose it should be possible to write some helpers that avoid
all that and don't need the temp array, but I'm not really
motivated enough to do that myself.

Keep it simple and use get_dma_address(). We can find ways to throw away
the cache later if need be.

I'd do it straight away. I think cache for a large framebuffer, the kind
which needs remapping could be quite big! Even the more fragmented
memory the bigger the cache, and so if it sticks around pointlessly for
the lifetime of the framebuffer it is a double whammy.

The tree is indexed with the ggtt offset so memory fragmentation
shouldn't matter I think. Or did I totally miss something?

I think it is indexed by page index, but the number of tree entries
depends on the number of sg chunks, which in turn depends on the system
memory fragmentation. Consecutive pages are stored as "exceptional"
entries so that is more efficient IIRC,

Didn't look any more efficient to me. It's still one page per slot
AFAICS. The exceptional entry just tells you where to find the start
of the current sg chunk.

but TBH I don't remember how
many of those it can store before it needs a new tree node. Anyways, if
I am not completely mistaken then the tree metadata size is proportional
to backing store fragmentation, and secondary proportional to object
size. (Due exceptional entry spill.)

The relative overhead should be highest for a single page object
(576/4096 = ~15%). For big objects it should be something around
.2% AFAICS.

It is a bit annoying since radix tree does not have a helper to tell us
the number of allocated nodes. I don't remember how I measure this last
time.. Will try to recall.

I wrote myself a helper to count the nodes and that seems to
agree with my back of the envelope calculations.

How many sg chunks and then radix nodes for some typical framebuffer did 
that show? Or maybe not typical but the one which will require 
remapping. Did you measure after a fresh boot only or also on a bit of a 
memory constrained system (some memory pressure should equal more 
fragmentation)?

If not even that shows the cache size as significant, then I guess my 
worries that large framebuffers would be sufficiently fragmented for 
this to be much more were unfounded, so feel free to go with it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx