On Tue, Jul 13, 2021 at 02:04:31PM +0100, Matthew Auld wrote: > We skip filling out the pt with scratch entries if the va range covers > the entire pt, since we later have to fill it with the PTEs for the > object pages anyway. However this might leave open a small window where > the PTEs don't point to anything valid for the HW to consume. > > When for example using 2M GTT pages this fill_px() showed up as being > quite significant in perf measurements, and ends up being completely > wasted since we ignore the pt and just use the pde directly. > > Anyway, currently we have our PTE construction split between alloc and > insert, which is probably slightly iffy nowadays, since the alloc > doesn't actually allocate anything anymore, instead it just sets up the > page directories and points the PTEs at the scratch page. Later when we > do the insert step we re-program the PTEs again. Better might be to > squash the alloc and insert into a single step, then bringing back this > optimisation(along with some others) should be possible. > > Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables") > Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > Cc: Jon Bloomfield <jon.bloomfield@xxxxxxxxx> > Cc: Chris Wilson <chris.p.wilson@xxxxxxxxx> > Cc: Daniel Vetter <daniel@xxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # v4.15+ This is some impressively convoluted code, and I'm scared. But as far as I managed to convince myself, your story here checks out. Problem will be a bit that this code moved around a _lot_ so we'll need a lot of dedicated backports :-( Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > --- > drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c > index 3d02c726c746..6e0e52eeb87a 100644 > --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c > +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c > @@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm, > __i915_gem_object_pin_pages(pt->base); > i915_gem_object_make_unshrinkable(pt->base); > > - if (lvl || > - gen8_pt_count(*start, end) < I915_PDES || > - intel_vgpu_active(vm->i915)) > - fill_px(pt, vm->scratch[lvl]->encode); > + fill_px(pt, vm->scratch[lvl]->encode); > > spin_lock(&pd->lock); > if (likely(!pd->entry[idx])) { > -- > 2.26.3 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch