Re: [PATCH v2] drm/i915/gt: Register the migrate contexts with their engines

Daniel Vetter <daniel@xxxxxxxx> · Thu, 26 Aug 2021 16:59:25 +0200

On Thu, Aug 26, 2021 at 03:59:30PM +0200, Thomas Hellström wrote:
> On Thu, 2021-08-26 at 14:44 +0200, Daniel Vetter wrote:
> > On Thu, Aug 26, 2021 at 12:45:14PM +0200, Thomas Hellström wrote:
> > > Pinned contexts, like the migrate contexts need reset after resume
> > > since their context image may have been lost. Also the GuC needs to
> > > register pinned contexts.
> > > 
> > > Add a list to struct intel_engine_cs where we add all pinned
> > > contexts on
> > > creation, and traverse that list at resume time to reset the pinned
> > > contexts.
> > > 
> > > This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest
> > > for now,
> > > but proper LMEM backup / restore is needed for full suspend
> > > functionality.
> > > However, note that even with full LMEM backup / restore it may be
> > > desirable to keep the reset since backing up the migrate context
> > > images
> > > must happen using memcpy() after the migrate context has become
> > > inactive,
> > > and for performance- and other reasons we want to avoid memcpy()
> > > from
> > > LMEM.
> > > 
> > > Also traverse the list at guc_init_lrc_mapping() calling
> > > guc_kernel_context_pin() for the pinned contexts, like is already
> > > done
> > > for the kernel context.
> > > 
> > > v2:
> > > - Don't reset the contexts on each __engine_unpark() but rather at
> > >   resume time (Chris Wilson).
> > > 
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
> > > Cc: Matthew Auld <matthew.auld@xxxxxxxxx>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
> > > Cc: Brost Matthew <matthew.brost@xxxxxxxxx>
> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
> > 
> > I guess it got lost, but I few weeks ago I stumbled over this and
> > wondered
> > why we're even setting up a separate context or at least why a
> > separate vm
> > compared to the gt->vm we have already?
> > 
> > Even on chips with bazillions of copy engines the plan is that we
> > only
> > reserve a single one for kernel migrations, so there's not really a
> > need
> > for quite this much generality I think. Maybe check with Jon
> > Bloomfield on
> > this.
> 
> Are you referring to the generality of the migration code itself or to
> the generality of using a list in this patch to register multiple
> pinned contexts to an engine? 
> 
> For the migration code itself, I figured reserving one copy engine for
> migration was strictly needed for recoverable page-faults? In the
> current version we're not doing that, but just tying a pinned migration
> context to the first available copy engine on the gt, to be used when
> we don't have a ww context available to pin a separate context using a
> random copy engine. Note also the ring size of the migration contexts;
> since we're populating the page-tables for each blit, it's not hard to
> fill the ring and in the end multiple contexts I guess boils down to
> avoiding priority inversion on migration, including blocking high
> priority kernel context tasks.
> 
> As for not using the gt->vm, I'm not completely sure if we can do our
> special page-table setup on that, Got to defer that question to Chris,
> but once Ram's work of supporting 64K LMEM PTEs on that has landed I
> guess we could easily reuse the gt->vm if possible and suitable.

Just on why we have gt->vm and then also the migration vm. The old mail I
typed up on this:

https://lore.kernel.org/dri-devel/CAKMK7uG6g+DQQEcjqeA6=Z2ENHogaMuvKERDgKm5jKq3u+a1jQ@xxxxxxxxxxxxxx/

-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch