Revisiting below hunk of patch-7 comment, as per offline discussion with Matt, there is little benefit to even making that guc-id lookup because: 1. the delay between the context reset notification (when the vmas are copied and when we verify we had received a guc err capture dump) may be subjectively large enough and not tethered that the guc-id may have already been re-assigned. 2. I was really looking for some kind of unique context handle to print out that could be correlated (by user inspecting the dump) back to a unique app or process or context-id but cant find such a param in struct intel_context. As part of further reviewing the end to end flows and possible error scenarios, there also may potentially be a mismatch between "which context was reset by guc at time-n" vs "which context's vma buffers is being printed out at time-n+x" if we are experiencing back-to-back resets and the user dumped the debugfs x-time later. (Recap: First, guc notifies capture event, second, guc notifies context reset during which we trigger i915_gpu_coredump. In this second step, the vma's are dumped and we verify that the guc capture happened but don't parse the guc-err-capture-logs yet. Third step is when user triggers the debugfs to dump which is when we parse the error capture logs.) As a fix, what we can do in the guc_error_capture report out is to ensure that we dont re-print the previously dumped vmas if we end up finding multiple guc-error-capture dumps since the i915_gpu_coredump would have only captured the vma's for the very first context that was reset. And with guc-submission, that would always correlate to the "next-yet-to-be-parsed" guc-err-capture dump (since the guc-error-capture logs are large enough to hold data for multiple dumps). The changes (removal of below-hunk and adding of only-print-the-first-vma") is trivial but i felt it warranted a good explanation. Apologies for the inbox noise. ...alan On Tue, 2021-12-07 at 22:32 -0800, Alan Previn Teres Alexis wrote: > Thanks again for the detailed review here. > Will fix all the rest on next rev. > One special response for this one: > > > On Tue, 2021-12-07 at 16:22 -0800, Matthew Brost wrote: > > On Mon, Nov 22, 2021 at 03:04:02PM -0800, Alan Previn wrote: > > > + if (datatype == GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE) { > > > + GCAP_PRINT_GUC_INST_INFO(i915, ebuf, data); > > > + eng_inst = FIELD_GET(GUC_CAPTURE_DATAHDR_SRC_INSTANCE, data.info); > > > + eng = guc_lookup_engine(guc, engineclass, eng_inst); > > > + if (eng) { > > > + GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng); > > > + } else { > > > + PRINT(&i915->drm, ebuf, " i915-Eng-Lookup Fail!\n"); > > > + } > > > + ce = guc_context_lookup(guc, data.guc_ctx_id); > > > > You are going to need to reference count the 'ce' here. See > > intel_guc_context_reset_process_msg for an example. > > > > Oh crap - i missed this one - which you had explicitly mentioned offline when i was doing the > development. Sorry about that i just totally missed it from my todo-notes. > > ...alan