On Thu, 2025-01-23 at 10:45 -0600, Lucas De Marchi wrote: > On Thu, Jan 23, 2025 at 08:52:13AM -0600, Jose Souza wrote: > > On Thu, 2025-01-23 at 08:30 -0600, Lucas De Marchi wrote: > > > On Thu, Jan 23, 2025 at 08:14:11AM -0600, Jose Souza wrote: > > > > On Wed, 2025-01-22 at 21:11 -0800, Lucas De Marchi wrote: > > > > > Having the exec queue snapshot inside a "GuC CT" section was always > > > > > wrong. Commit c28fd6c358db ("drm/xe/devcoredump: Improve section > > > > > headings and add tile info") tried to fix that bug, but with that also > > > > > broke the mesa tool that parses the devcoredump, hence it was reverted > > > > > in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa > > > > > debug tool"). > > > > > > > > > > With the mesa tool also fixed, this can propagate as a fix on both > > > > > kernel and userspace side to avoid unnecessary headache for a debug > > > > > feature. > > > > > > > > This will break older versions of the Mesa parser. Is this really necessary? > > > > > > See cover letter with the mesa MR that would fix the tool to follow the > > > kernel fix and work with both newer and older format. Linking it here > > > anyway: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33177 > > > > Still someone running the older version of the parser with a new Xe KMD would not be able to parse it. > > I understand that we can break it but is this really worthy? not in my opinion. > > because for the debug nature of this file, it's hard if we always keep > the cruft around. In 5 years developers implementing new decoders will > have to get data from random places because we will notice things are > wrongly placed. > > isn't it easier to do do it early so we don't increase the exposure > and just say "kernel screwed that up and fixed it", then propagate the > change in mesa in a stable release, just like we are doing in the > kernel? > > > > > > > > > It's a fix so simple that IMO it's better than carrying the cruft ad > > > infinitum on all the tools that may possibly parse the devcoredump. > > > > > > > > > > Is it worth breaking the tool? In my opinion, it is not. > > > > > > > > Also, do we need to discuss this now? Wouldn't it be better to focus on bringing the GuC log in first? > > > > > > That's what the second patch does. We need to discuss both now and > > > decide, otherwise we can't re-enable it and have either the guc log > > > parser or mesa's aubinator_error_decode_xe broken. > > > > I can't understand why it needs both, could you explain further? > > we are already discussing it, why not? Also as I said there's the guc > log parser, another tool, that is already expecting it in the other > place. So if we are going to re-enable the guc log, it's the best > opportunity to fix this, otherwise we will probably never do it and keep > accumulating. Much easier change a internal tool, no? The GuC log is in other section("**** GuC Log ****"), to me this change is more cosmetic than function and not worthy to break older versions of the parser. I can live with that but you will need to convince other Intel Mesa engineers, when I mentioned that Mesa parser was broken in a Mesa staff meeting, I was asked to push hard to get the Xe KMD patch reverted. That is why I think it should be taken care in another patch series as it could take more time... > > Lucas De Marchi > > > > > > > > > Lucas De Marchi > > > > > > > > > > > > > > > > > Cc: John Harrison <John.C.Harrison@xxxxxxxxx> > > > > > Cc: Julia Filipchuk <julia.filipchuk@xxxxxxxxx> > > > > > Cc: José Roberto de Souza <jose.souza@xxxxxxxxx> > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool") > > > > > Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> > > > > > --- > > > > > drivers/gpu/drm/xe/xe_devcoredump.c | 6 +----- > > > > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c > > > > > index 81dc7795c0651..a7946a76777e7 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_devcoredump.c > > > > > +++ b/drivers/gpu/drm/xe/xe_devcoredump.c > > > > > @@ -119,11 +119,7 @@ static ssize_t __xe_devcoredump_read(char *buffer, size_t count, > > > > > drm_puts(&p, "\n**** GuC CT ****\n"); > > > > > xe_guc_ct_snapshot_print(ss->guc.ct, &p); > > > > > > > > > > - /* > > > > > - * Don't add a new section header here because the mesa debug decoder > > > > > - * tool expects the context information to be in the 'GuC CT' section. > > > > > - */ > > > > > - /* drm_puts(&p, "\n**** Contexts ****\n"); */ > > > > > + drm_puts(&p, "\n**** Contexts ****\n"); > > > > > xe_guc_exec_queue_snapshot_print(ss->ge, &p); > > > > > > > > > > drm_puts(&p, "\n**** Job ****\n"); > > > > > >