Re: [PATCH 1/2] drm/xe/devcoredump: Move exec queue snapshot to Contexts section

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-01-23 at 10:45 -0600, Lucas De Marchi wrote:
> On Thu, Jan 23, 2025 at 08:52:13AM -0600, Jose Souza wrote:
> > On Thu, 2025-01-23 at 08:30 -0600, Lucas De Marchi wrote:
> > > On Thu, Jan 23, 2025 at 08:14:11AM -0600, Jose Souza wrote:
> > > > On Wed, 2025-01-22 at 21:11 -0800, Lucas De Marchi wrote:
> > > > > Having the exec queue snapshot inside a "GuC CT" section was always
> > > > > wrong.  Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
> > > > > headings and add tile info") tried to fix that bug, but with that also
> > > > > broke the mesa tool that parses the devcoredump, hence it was reverted
> > > > > in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
> > > > > debug tool").
> > > > > 
> > > > > With the mesa tool also fixed, this can propagate as a fix on both
> > > > > kernel and userspace side to avoid unnecessary headache for a debug
> > > > > feature.
> > > > 
> > > > This will break older versions of the Mesa parser. Is this really necessary?
> > > 
> > > See cover letter with the mesa MR that would fix the tool to follow the
> > > kernel fix and work with both newer and older format. Linking it here
> > > anyway: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33177
> > 
> > Still someone running the older version of the parser with a new Xe KMD would not be able to parse it.
> > I understand that we can break it but is this really worthy? not in my opinion.
> 
> because for the debug nature of this file, it's hard if we always keep
> the cruft around. In 5 years developers implementing new decoders will
> have to get data from random places because we will notice things are
> wrongly placed.
> 
> isn't it easier to do do it early so we don't increase the exposure
> and just say "kernel screwed that up and fixed it", then propagate the
> change in mesa in a stable release, just like we are doing in the
> kernel?
> 
> > 
> > > 
> > > It's a fix so simple that IMO it's better than carrying the cruft ad
> > > infinitum on all the tools that may possibly parse the devcoredump.
> > > 
> > > 
> > > > Is it worth breaking the tool? In my opinion, it is not.
> > > > 
> > > > Also, do we need to discuss this now? Wouldn't it be better to focus on bringing the GuC log in first?
> > > 
> > > That's what the second patch does. We need to discuss both now and
> > > decide, otherwise we can't re-enable it and have either the guc log
> > > parser or mesa's aubinator_error_decode_xe broken.
> > 
> > I can't understand why it needs both, could you explain further?
> 
> we are already discussing it, why not?  Also as I said there's the guc
> log parser, another tool, that is already expecting it in the other
> place. So if we are going to re-enable the guc log, it's the best
> opportunity to fix this, otherwise we will probably never do it and keep
> accumulating.

Much easier change a internal tool, no?

The GuC log is in other section("**** GuC Log ****"), to me this change is more cosmetic than function and not worthy to break older versions of the
parser.

I can live with that but you will need to convince other Intel Mesa engineers, when I mentioned that Mesa parser was broken in a Mesa staff meeting, I
was asked to push hard to get the Xe KMD patch reverted.
That is why I think it should be taken care in another patch series as it could take more time...

> 
> Lucas De Marchi
> 
> > 
> > > 
> > > Lucas De Marchi
> > > 
> > > > 
> > > > > 
> > > > > Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
> > > > > Cc: Julia Filipchuk <julia.filipchuk@xxxxxxxxx>
> > > > > Cc: José Roberto de Souza <jose.souza@xxxxxxxxx>
> > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
> > > > > Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_devcoredump.c | 6 +-----
> > > > >  1 file changed, 1 insertion(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > index 81dc7795c0651..a7946a76777e7 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> > > > > @@ -119,11 +119,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count,
> > > > >  	drm_puts(&p, "\n**** GuC CT ****\n");
> > > > >  	xe_guc_ct_snapshot_print(ss->guc.ct, &p);
> > > > > 
> > > > > -	/*
> > > > > -	 * Don't add a new section header here because the mesa debug decoder
> > > > > -	 * tool expects the context information to be in the 'GuC CT' section.
> > > > > -	 */
> > > > > -	/* drm_puts(&p, "\n**** Contexts ****\n"); */
> > > > > +	drm_puts(&p, "\n**** Contexts ****\n");
> > > > >  	xe_guc_exec_queue_snapshot_print(ss->ge, &p);
> > > > > 
> > > > >  	drm_puts(&p, "\n**** Job ****\n");
> > > > 
> > 





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux