On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote: > On 2021-11-15 12:31, Lang Yu wrote: > > On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote: > >> On 2021-11-15 10:04, Lang Yu wrote: > >>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote: > >>>> On 2021-11-15 07:41, Lang Yu wrote: > >>>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote: > >>>>>> On 2021-11-12 16:03, Christian König wrote: > >>>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer: > >>>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote: > >>>>>>>>> On 2021-11-12 13:47, Christian König wrote: > >>>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Qz2VKDE0%2BcIuXZMy0IL3NLZAjeXLimwl8PPZh4Cp4iE%3D&reserved=0 > >>>>>>>>>> > >>>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target(). > >>>>>>>>>> > >>>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC. > >>>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in > >>>>>>>>> dm_plane_helper_cleanup_fb. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb > >>>>>>>> This should say amdgpu_display_unpin_work_func. > >>>>>>> > >>>>>>> Ah! So that is the classic (e.g. non atomic) path? > >>>>>> > >>>>>> Presumably. > >>>>>> > >>>>>> > >>>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned? > >>>>>>> > >>>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called. > >>>>>>> > >>>>>>> E.g. not pin unbalance, but rather use after free. > >>>>>> > >>>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg). > >>>>> > >>>>> > >>>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will > >>>>> > >>>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer > >>>>> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq(). > >>>>> > >>>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it? > >>>>> Next call. > >>>>> > >>>>> The problem is the pinned buffer of last call to > >>>>> amdgpu_display_crtc_page_flip_target() is not unpinned. > >>>> > >>>> It's unpinned in dce_v*_0_crtc_disable. > >>> > >>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable(). > >>> So it's not unpinned... > >> > >> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless. > >> > >> Have you checked for the issue I described below? Should be pretty easy to catch. > >> > >> > >>>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1. > > > > Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is > > never called. > > It would be expected to happen when the screen turns off, e.g. due to DPMS. > > > > Though a single call to dce_v*_0_crtc_do_set_base() will > > only pin the BO, I found it will be unpinned in next call to > > dce_v*_0_crtc_do_set_base(). > > Yeah, that's the normal case when the new BO is different from the old one. > > To catch the case I described, try something like > > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > > index 18a7b3bd633b..5726bd87a355 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c > > @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc, > > return r; > > > > if (!atomic) { > > + WARN_ON_ONCE(target_fb == fb); > > r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM); > > if (unlikely(r != 0)) { > > amdgpu_bo_unreserve(abo); > I did some tests, the warning can be triggered. pin/unpin operations in *_crtc_do_set_base() and amdgpu_display_crtc_page_flip_target() are mixed. Regards, Lang > > -- > Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=jct%2F2NG7LamVRzzpkSy9XjQIDZRBIutSeK3VQ2jh524%3D&reserved=0 > Libre software enthusiast | Mesa and Xwayland developer