Re: [PATCH 02/20] drm/i915: Switch to object allocations for page directories

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 06 Jul 2020 21:01:54 +0100

Quoting Matthew Auld (2020-07-06 20:06:38)
> On Mon, 6 Jul 2020 at 07:19, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > The GEM object is grossly overweight for the practicality of tracking
> > large numbers of individual pages, yet it is currently our only
> > abstraction for tracking DMA allocations. Since those allocations need
> > to be reserved upfront before an operation, and that we need to break
> > away from simple system memory, we need to ditch using plain struct page
> > wrappers.
> >
> > In the process, we drop the WC mapping as we ended up clflushing
> > everything anyway due to various issues across a wider range of
> > platforms. Though in a future step, we need to drop the kmap_atomic
> > approach which suggests we need to pre-map all the pages and keep them
> > mapped.
> >
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> 
> <snip>

The other thing I'm toying with is whether to keep the unused preallocs
around on the ppgtt. The cost of the conservative allocations is
decidedly high in CI [thanks to memdebug tracking and verifying], as
each PD is itself a 4KiB table of pointers (as well as the 4KiB dma
page). Outside of CI, the issue is not as pressing, and if a workload
does not reach steady state quickly, then the extra allocations are just
one of many worries. For the steady state, we benefit from not having
surplus pages trapped in the ppgtt, as that is the danger of the
caching, when should we trim it?

[Previously we only allocated on demand, but keep a *small* number of WC
pages around because converting a page to/from WC was expensive.]

If there's a good answer for when we can/should free the surplus cache,
it's probably worth pursuing. Or if we deem it worth to keep the cache
limited to 15 entries [reusing a pagevec].

Overallocation is pita for having to preallocate; since we basically
have to have at least 2 PD for each level + actual span. For every vma,
even when bundling the insertions, as we don't know which entries will
be used until much later. So we almost certainly overallocate 4 PD
[16KiB system + 16KiB dma] for every single vma. Even a 15 entry stash
will be quickly exhausted; oh well.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx