Quoting Mika Kuoppala (2019-06-14 17:43:50) > If we setup backing phys page for 3lvl pdps, even they > are not used, we lose 5 pages per ppgtt. > > Trading this memory on bsw, we gain more common code paths for all > gen8+ directory manipulation. And those paths are now void of checks > for page directory type, making the hot paths faster. > > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 112 +++++++++++++++++----------- > 1 file changed, 68 insertions(+), 44 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index ba2802c25d13..c76c92072d54 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -714,22 +714,14 @@ static struct i915_page_directory *alloc_pd(struct i915_address_space *vm) > return pd; > } > > -static inline bool pd_has_phys_page(const struct i915_page_directory * const pd) > -{ > - return pd->base.page; > -} > - > static void free_pd(struct i915_address_space *vm, > struct i915_page_directory *pd) > { > - if (likely(pd_has_phys_page(pd))) > - cleanup_page_dma(vm, &pd->base); > - > + cleanup_page_dma(vm, &pd->base); > kfree(pd); > } > > #define init_pd(vm, pd, to) { \ > - GEM_DEBUG_BUG_ON(!pd_has_phys_page(pd)); \ > fill_px((vm), (pd), gen8_pde_encode(px_dma(to), I915_CACHE_LLC)); \ > memset_p((pd)->entry, (to), 512); \ > } > @@ -747,8 +739,7 @@ static void __set_pd_entry(struct i915_page_directory * const pd, > > #define set_pd_entry(pd, pde, to) ({ \ > (pd)->entry[(pde)] = (to); \ > - if (likely(pd_has_phys_page(pd))) \ > - __set_pd_entry((pd), (pde), \ > + __set_pd_entry((pd), (pde), \ > gen8_pde_encode(px_dma(to), I915_CACHE_LLC)); \ > }) The macro seems like it is redundant now as well, since __set_pd_entry should be able to set pd->entry[pde] = to. Or at least one step too large. Ok, I'm willing to make the sacrifice for simplicity. I think the extra allocation will be in the noise, you need a lot of contexts with small workloads for it to be a significant factor i.e. microbenchmarks. And even GL microbenchmarks tend to do some GL so aren't that sensitive to context allocation rates. OglDrvCtx springs to mind. That would be a reasonable sanity check that this is indeed in the noise. The only question is does this not become simpler if we do this earlier? -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx