Re: [PATCH v2 2/3] drm/i915/guc: Close deregister-context race against CT-loss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Additional update from the most recent testing.

When relying solely on guc_lrc_desc_unpin getting a failure from deregister_context
as a means for identifying that we are in the "deregister-context-vs-suspend-late" race,
it is too late a location to handle this safely. This is because one of the
first things destroyed_worker_func does it to take a gt pm wakeref - which triggers
the gt_unpark function that does a whole lot bunch of other flows including triggering more
workers and taking additional refs. That said, its best to not even call
deregister_destroyed_contexts from the worker when !intel_guc_is_ready (ct-is-disabled).

...alan


On Fri, 2023-08-25 at 11:54 -0700, Teres Alexis, Alan Previn wrote:
> just a follow up note-to-self:
> 
> On Tue, 2023-08-15 at 12:08 -0700, Teres Alexis, Alan Previn wrote:
> > On Tue, 2023-08-15 at 09:56 -0400, Vivi, Rodrigo wrote:
> > > On Mon, Aug 14, 2023 at 06:12:09PM -0700, Alan Previn wrote:
> > > > 
> [snip]
> 
> in guc_submission_send_busy_loop, we are incrementing the following
> that needs to be decremented if the function fails.
> 
> atomic_inc(&guc->outstanding_submission_g2h);
> 
> also, it seems that even with thie unroll design - we are still
> leaking a wakeref elsewhere. this is despite a cleaner redesign of
> flows in function "guc_lrc_desc_unpin"
> (discussed earlier that wasnt very readible).
> 
> will re-rev today but will probably need more follow ups
> tracking that one more leaking gt-wakeref (one in thousands-cycles)
> but at least now we are not hanging mid-suspend.. we bail from suspend
> with useful kernel messages.
> 
> 
> 
> 





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux