On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote: > On 2021-11-15 10:04, Lang Yu wrote: > > On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote: > >> On 2021-11-15 07:41, Lang Yu wrote: > >>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote: > >>>> On 2021-11-12 16:03, Christian König wrote: > >>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer: > >>>>>> On 2021-11-12 15:29, Michel Dänzer wrote: > >>>>>>> On 2021-11-12 13:47, Christian König wrote: > >>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7nwIYd1um420XHVpOzeIvz37%2FLQqHF%2F6aRKfzgxUTnM%3D&reserved=0 > >>>>>>>> > >>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target(). > >>>>>>>> > >>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC. > >>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in > >>>>>>> dm_plane_helper_cleanup_fb. > >>>>>>> > >>>>>>> > >>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb > >>>>>> This should say amdgpu_display_unpin_work_func. > >>>>> > >>>>> Ah! So that is the classic (e.g. non atomic) path? > >>>> > >>>> Presumably. > >>>> > >>>> > >>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned? > >>>>> > >>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called. > >>>>> > >>>>> E.g. not pin unbalance, but rather use after free. > >>>> > >>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg). > >>> > >>> > >>> Actually, each call to amdgpu_display_crtc_page_flip_target() will > >>> > >>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer > >>> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq(). > >>> > >>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it? > >>> Next call. > >>> > >>> The problem is the pinned buffer of last call to > >>> amdgpu_display_crtc_page_flip_target() is not unpinned. > >> > >> It's unpinned in dce_v*_0_crtc_disable. > > > > I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable(). > > So it's not unpinned... > > __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless. > > Have you checked for the issue I described below? Should be pretty easy to catch. > > > >> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1. Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is never called. Though a single call to dce_v*_0_crtc_do_set_base() will only pin the BO, I found it will be unpinned in next call to dce_v*_0_crtc_do_set_base(). Anyway, these pin/unpin operations looks error-prone. Regards, Lang > > -- > Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=EppaFKsy78tecP6oIVVuBh3enb%2Fu8jurugqEwUxDCYk%3D&reserved=0 > Libre software enthusiast | Mesa and Xwayland developer