On Tue, 2022-01-04 at 13:56 +0000, Tvrtko Ursulin wrote: > > > The flow of events are as below: > > > > 1. guc sends notification that an error capture was done and ready to take. > > - at this point we copy the guc error captured dump into an interim store > > (larger buffer that can hold multiple captures). > > 2. guc sends notification that a context was reset (after the prior) > > - this triggers a call to i915_gpu_coredump with the corresponding engine-mask > > from the context that was reset > > - i915_gpu_coredump proceeds to gather entire gpu state including driver state, > > global gpu state, engine state, context vmas and also engine registers. For the > > engine registers now call into the guc_capture code which merely needs to verify > > that GuC had already done a step 1 and we have data ready to be parsed. > > What about the time between the actual reset and receiving the context > reset notification? Latter will contain intel_context->guc_id - can that > be re-assigned or "retired" in between the two and so cause problems for > matching the correct (or any) vmas? > Not it cannot because its only after the context reset notification that i915 starts taking action against that cotnext - and even that happens after the i915_gpu_codedump (engine-mask-of-context) happens. That's what i've observed in the code flow. > Regards, > > Tvrtko