[adding a bunch of list and people as well as Timur Tabi, who authored the culprit] Sid Pranjale, thx for the report. FWIW, I'm just replying to add this to the regression tracking to ensure it does not fall through the cracks. Nevertheless let me mention two things while at it: On 29.02.24 18:58, Sid Pranjale wrote: > Nouveau deallocates a few buffers post GPU init which are required for GPU suspend/resume to function correctly. > This is likely not as big an issue on systems where the NVGPU is the only GPU, but on multi-GPU set ups it leads to a regression where the kernel module errors and results in a system-wide rendering freeze. These lines are too long, see Documentation/process/submitting-patches.rst for details. > This commit addresses that regression by moving the two buffers required for suspend and resume to be deallocated at driver unload instead of post init. > > Fixes: 042b5f8 ("drm/nouveau: fix several DMA buffer leaks") And that should be: Fixes: 042b5f83841fbf ("drm/nouveau: fix several DMA buffer leaks") > Signed-off-by: Sid Pranjale <sidpranjale127@xxxxxxxxxxxxxx> > --- > drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c > index a64c81385..a73a5b589 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c > @@ -1054,8 +1054,6 @@ r535_gsp_postinit(struct nvkm_gsp *gsp) > /* Release the DMA buffers that were needed only for boot and init */ > nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw); > nvkm_gsp_mem_dtor(gsp, &gsp->libos); > - nvkm_gsp_mem_dtor(gsp, &gsp->rmargs); > - nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta); > > return ret; > } > @@ -2163,6 +2161,8 @@ r535_gsp_dtor(struct nvkm_gsp *gsp) > > r535_gsp_dtor_fws(gsp); > > + nvkm_gsp_mem_dtor(gsp, &gsp->rmargs); > + nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta); > nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem); > nvkm_gsp_mem_dtor(gsp, &gsp->loginit); > nvkm_gsp_mem_dtor(gsp, &gsp->logintr); To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced 042b5f83841fbf #regzbot title drm/nouveau: rendering freezes with multi-GPU setup #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.