Hey Ludovic, On 3/8/24, Ludovic.Desroches@xxxxxxxxxxxxx <Ludovic.Desroches@xxxxxxxxxxxxx> wrote: > On 3/6/24 20:49, Pierre-Louis Dourneau wrote: > > > > From: Arnaud Lahache <a.lahache@xxxxxxxxxx> > > > > Fixes a memory leak occurring on each modeset update. > > > > Running a program such as libdrm's modetest[0] with this driver exhausts > > all available memory after a few minutes. Enabling kmemleak yields a series > > of such leak reports: > > > > unreferenced object 0xc21acf40 (size 64): > > comm "modetest", pid 210, jiffies 4294942460 (age 331.240s) > > hex dump (first 32 bytes): > > 00 a0 a1 c1 01 00 00 00 ff ff ff ff 4c cf 1a c2 ............L... > > 4c cf 1a c2 ff ff ff ff 58 cf 1a c2 58 cf 1a c2 L.......X...X... > > backtrace: > > [<d68b3e09>] kmalloc_trace+0x18/0x24 > > [<f858a020>] drm_atomic_helper_setup_commit+0x1e0/0x7e0 > > [<26e8ab04>] drm_atomic_helper_commit+0x40/0x160 > > [<49708b0c>] drm_atomic_commit+0xa8/0xf0 > > [<e58c2942>] drm_mode_obj_set_property_ioctl+0x154/0x3d8 > > [<5e97e57d>] drm_ioctl+0x200/0x3c4 > > [<ed514ba1>] sys_ioctl+0x240/0xb48 > > [<26aab344>] ret_fast_syscall+0x0/0x44 > > > > drm_atomic_helper_setup_commit() acquires a reference to a drm_crtc_commit > > for each CRTC and associated connectors and planes involved in a modeset. > > 64-byte leaks map well to the size of a drm_crtc_commit on 32-bit > > architectures, and the second 4-byte chunk in the hex dump above awfully > > looks like a reference count. > > > > We tracked this missing reference decrement down to the driver's > > atmel_hlcdc_plane_atomic_destroy_state() callback. Its CRTC counterpart, > > atmel_hlcdc_crtc_destroy_state(), calls into the drm_atomic helpers and > > properly releases its references to the commit. Planes didn't. Using the > > default helper for that purpose, __drm_atomic_helper_plane_destroy_state(), > > fixes the leak and avoids reimplementing the same logic in the driver. > > > > [0]: https://gitlab.freedesktop.org/mesa/drm/-/tree/main/tests/modetest > > Invoke with `modetest -M atmel-hlcdc -s 32:#0 -v`, assuming 32 is the > > ID of a connector. > > > > Signed-off-by: Arnaud Lahache <a.lahache@xxxxxxxxxx> > > Co-developed-by: Pierre-Louis Dourneau <pl.dourneau@xxxxxxxxxx> > > Signed-off-by: Pierre-Louis Dourneau <pl.dourneau@xxxxxxxxxx> > > Co-developed-by: Benoît Alcaina <b.alcaina@xxxxxxxxxx> > > Signed-off-by: Benoît Alcaina <b.alcaina@xxxxxxxxxx> > > --- > > As far as our testing goes, we've been running 6 of our production units > > with this patch for more than 2 weeks as per the date this patch is sent > > out. We can report stable memory usage. > > Hello Arnaud, > > This patch fixes the memory leak but introduces a crash on my side when > exiting a graphics app using the Microchip graphics library. > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 0 PID: 201 at lib/refcount.c:28 > refcount_warn_saturate+0x58/0x130 > > > > refcount_t: underflow; use-after-free. > > > > Modules linked in: > > > > CPU: 0 PID: 201 Comm: basic Not tainted 6.1.55-linux4microchip-2023.10+ > #65 > > > Hardware name: Microchip SAM9X60 > > > > unwind_backtrace from show_stack+0x10/0x18 > > > > show_stack from dump_stack_lvl+0x28/0x34 > > > > dump_stack_lvl from __warn+0x70/0xb8 > > > > __warn from warn_slowpath_fmt+0x78/0xac > > > > warn_slowpath_fmt from refcount_warn_saturate+0x58/0x130 > > > > refcount_warn_saturate from kref_put+0x54/0x5c > > > > kref_put from drm_fb_release+0x100/0x11c > > > > drm_fb_release from drm_file_free+0xcc/0x1bc > > > > drm_file_free from drm_release+0x44/0x94 > > > > drm_release from __fput+0xf0/0x20c > > > > __fput from task_work_run+0x8c/0xb0 > > > > task_work_run from do_exit+0x310/0x760 > > > > do_exit from sys_exit_group+0x0/0x14 > > > > sys_exit_group from get_signal+0x524/0x64c > > > > get_signal from do_work_pending+0xf4/0x398 > > > > do_work_pending from slow_work_pending+0xc/0x24 > > > > Exception stack(0xc9991fb0 to 0xc9991ff8) > > > > 1fa0: 0054dd48 00000000 005331f8 > 00000001 > > > 1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000 > 0051b424 > > > 1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff > > > > ---[ end trace 0000000000000000 ]--- > > > > 8<--- cut here --- > > > > Unable to handle kernel NULL pointer dereference at virtual address > 00000004 > > > [00000004] *pgd=00000000 > > > > Internal error: Oops: 805 [#1] ARM > > > > Modules linked in: > > > > CPU: 0 PID: 201 Comm: basic Tainted: G W > 6.1.55-linux4microchip-2023.10+ #65 > > > Hardware name: Microchip SAM9X60 > > > > PC is at drm_fb_release+0xc0/0x11c > > > > LR is at drm_fb_release+0x100/0x11c > > > > pc : [<c0329b10>] lr : [<c0329b50>] psr: 80000013 > > > > sp : c9991e48 ip : 00000000 fp : 0051b424 > > > > r10: c174a5f0 r9 : 00000000 r8 : c2188074 > > > > r7 : 60000013 r6 : c9991e5c r5 : ffffff8c r4 : c2188048 > > > > r3 : c1590994 r2 : c2188048 r1 : 00000000 r0 : c1590920 > > > > Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none > > > > Control: 0005317f Table: 2384c000 DAC: 00000051 > > > > Register r0 information: slab kmalloc-192 start c1590920 pointer offset > 0 size 192 > > > Register r1 information: NULL pointer > > > > Register r2 information: slab kmalloc-192 start c2188000 pointer offset > 72 size 192 > > > Register r3 information: slab kmalloc-192 start c1590920 pointer offset > 116 size 192 > > > Register r4 information: slab kmalloc-192 start c2188000 pointer offset > 72 size 192 > > > Register r5 information: non-paged memory > > > > Register r6 information: 2-page vmalloc region starting at 0xc9990000 > allocated at kernel_clone+0xb4/0x204 > > > Register r7 information: non-paged memory > > > > Register r8 information: slab kmalloc-192 start c2188000 pointer offset > 116 size 192 > > > Register r9 information: NULL pointer > > > > Register r10 information: slab task_struct start c174a140 pointer offset > 1200 > > > Register r11 information: non-paged memory > > > > Register r12 information: NULL pointer > > > > Process basic (pid: 201, stack limit = 0x41541c7b) > > > > Stack: (0xc9991e48 to 0xc9992000) > 1e40: 00000001 00000000 00000000 00000000 00000000 > c9991e5c > > > 1e60: c9991e5c 5c44a9dd c2188000 c1621800 c2188060 c03146e4 000000c9 > 0000e200 > > > 1e80: 00000001 00000000 00000000 c1621800 c38f2cc0 c0fb63e0 c0c1ce70 > c0f99560 > > > 1ea0: 00000000 c0314a5c c38f2cc0 c10ee6e8 000a201f c00c66a0 c38f2cc0 > 00000001 > > > 1ec0: c09f1a4c c00c67fc c38f2e00 c174a140 c0a20b1c c2232a50 c9991ef4 > c002f000 > > > 1ee0: c174a140 c2232a20 c174a140 c001a094 00000002 00000008 c383c880 > 5c44a9dd > > > 1f00: 00000002 c383c880 0051b424 c001a6a8 00000009 c0024af0 beb53c50 > 00000000 > > > 1f20: 00000000 5c44a9dd 00000000 00000000 c9991fb0 c174a140 00000000 > 00000000 > > > 1f40: 00000000 00000000 0051b424 c000c244 00000000 00000000 00000000 > 00000000 > > > 1f60: 00000000 00000000 00000009 00000000 00000000 00000000 00000000 > 00000000 > > > 1f80: 00000000 00000000 00000000 5c44a9dd b68357dc 20000010 ffffffff > 0005317f > > > 1fa0: 00000000 c174a140 43ab8000 c00084dc 0054dd48 00000000 005331f8 > 00000001 > > > 1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000 > 0051b424 > > > 1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff 00000000 > 00000000 > > > drm_fb_release from drm_file_free+0xcc/0x1bc > > > > drm_file_free from drm_release+0x44/0x94 > > > > drm_release from __fput+0xf0/0x20c > > > > __fput from task_work_run+0x8c/0xb0 > > > > task_work_run from do_exit+0x310/0x760 > > > > do_exit from sys_exit_group+0x0/0x14 > > > > sys_exit_group from get_signal+0x524/0x64c > > > > get_signal from do_work_pending+0xf4/0x398 > > > > do_work_pending from slow_work_pending+0xc/0x24 > > > > Exception stack(0xc9991fb0 to 0xc9991ff8) > > > > 1fa0: 0054dd48 00000000 005331f8 > 00000001 > > > 1fc0: 00533270 00000002 00000000 404c8000 beb54170 0051b448 43ab8000 > 0051b424 > > > 1fe0: b6cc1d98 beb54040 b6c224fc b68357dc 20000010 ffffffff > > > > Code: e590c018 e5902078 e5901074 e35c0001 (e5812004) > > > > ---[ end trace 00000000c0a20288 ]--- > > > > Fixing recursive fault but reboot is needed! We've tried to reproduce your crash with 6.1.22-linux4microchip-2023.04, to no avail. We'll try to upgrade to 6.1.55-linux4microchip-2023.10 (your version) and test again. Is there a particular EGT program you recommend to trigger the crash? Or any should do? We've limited ourselves to a few of the samples provided with the library, namely egt_basic, egt_water, egt_charts, and egt_i18n. > The memory leak had been introduced with this commit: > > > commit eec44d44a3d2d00c8f663f13555d3dd280b1ea3f > Author: Daniel Vetter <daniel.vetter@xxxxxxxx> > Date: Thu Jan 21 16:29:54 2021 +0100 > > > > drm/atmel: Use drm_atomic_helper_commit > > One of these drivers that predates the nonblocking support in helpers, > and hand-rolled its own thing. Entirely not anything specific here, we > can just delete it all and replace it with the helper version. > > Could also perhaps use the drm_mode_config_helper_suspend/resume > stuff, for another few lines deleted. But I'm not looking at that > stuff, I'm just going through all the atomic commit functions and make > sure they have properly annotated dma-fence critical sections > everywhere. > > > I think this move to the drm atomic helper should have gone with an > update on the release side as well. There is probably something wrong > with the atomic_destroy_state callbacks provided by the atmel-hlcdc driver. > > Regards, > Ludovic > > > > > Admittedly, our usage of the DRM uAPI is rather simple: create 2 dumb > > buffers, do an initial MODE_SETCRTC, and then MODE_PAGE_FLIP between the > > two dumb buffers at the rate of once per second on average. We haven't > > evaluated more complex workloads. > > > > drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c > > index daa508504f47..390c4fc62af7 100644 > > --- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c > > +++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c > > @@ -934,8 +934,7 @@ static void atmel_hlcdc_plane_atomic_destroy_state(struct drm_plane *p, > > state->dscrs[i]->self); > > } > > > > - if (s->fb) > > - drm_framebuffer_put(s->fb); > > + __drm_atomic_helper_plane_destroy_state(s); > > > > kfree(state); > > } > > > > base-commit: 9dfc46c87cdc8f5a42a71de247a744a6b8188980 > > -- > > 2.34.1 > > Pierre-Louis