Just to follow up on this, I decided not to file the report, since Nick's latest patch that "Fixes:" this has fixed my issue. Thanks for the good work on that one Nick. Might want to make sure that makes it in to the 5.7 fixes :) Thanks again guys. ~Matt On 5/5/20 11:59 AM, Kazlauskas, Nicholas wrote: > Can you file a full bug report on the gitlab tracker? > > FreeSync is still working on my Navi setups with this patch applied, and > this patch is essentially just a revert of another patch already (where > FreeSync worked before). > > I can understand the v2 of this series causing issues, but the v1 > shouldn't be - so I'd like to understand more about the setup where this > is causing issues - ASIC, OS, compositor, displays, dmesg log, X log, etc. > > Regards, > Nicholas Kazlauskas > > On 2020-05-05 1:03 p.m., Alex Deucher wrote: >> Mario or Nick any thoughts? >> >> Alex >> >> On Mon, May 4, 2020 at 1:35 PM Matt Coffin <mcoffin13@xxxxxxxxx> wrote: >>> >>> Hey guys, >>> >>> This is still an issue for me, and I'm still having to run a patch to >>> revert this as of 5.7-rc4. To avoid breaking a lot of people's Navi >>> setups in 5.7, is there any news on this? Has anyone else at the very >>> least been able to reproduce the problem? >>> >>> It happens for me in every single program that mesa allows to utilize >>> variable refresh rates, and reverting it "fixes" the issue. >>> >>> Cheers, and sorry for the extra email, just making sure this is still on >>> someone's radar, >>> Matt >>> >>> On 4/14/20 5:32 PM, Matt Coffin wrote: >>>> Hey everyone, >>>> >>>> This patch broke variable refresh rate in games (all that I've tried so >>>> far... Project CARS 2, DiRT Rally 2.0, Assetto Corsa Competizione) as >>>> well as a simple freesync tester application. >>>> >>>> FreeSync tester I've been using: https://github.com/Nixola/VRRTest >>>> >>>> I'm not at all familiar with the page flipping code, so it would >>>> take me >>>> a long time to find the *right* way to fix it, but does someone else >>>> see >>>> why it would do that? >>>> >>>> The symptom is that the refresh rate of the display constantly bounces >>>> between the two ends of the FreeSync range (for me 40 -> 144), and the >>>> game stutters like a madman. >>>> >>>> Any help on where to start, ideas on how to fix it (other than just >>>> revert this commit, which I've done in the interim), or alternative >>>> patches would be appreciated. >>>> >>>> Thanks in advance for the work/help, >>>> Matt >>>> >>>> On 3/13/20 8:42 AM, Michel Dänzer wrote: >>>>> On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote: >>>>>> On 2020-03-12 10:32 a.m., Alex Deucher wrote: >>>>>>> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner >>>>>>> <mario.kleiner.de@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user >>>>>>>> events at vsartup for DCN")' introduces a new way of pageflip >>>>>>>> completion handling for DCN, and some trouble. >>>>>>>> >>>>>>>> The current implementation introduces a race condition, which >>>>>>>> can cause pageflip completion events to be sent out one vblank >>>>>>>> too early, thereby confusing userspace and causing flicker: >>>>>>>> >>>>>>>> prepare_flip_isr(): >>>>>>>> >>>>>>>> 1. Pageflip programming takes the ddev->event_lock. >>>>>>>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED >>>>>>>> 3. Releases ddev->event_lock. >>>>>>>> >>>>>>>> --> Deadline for surface address regs double-buffering passes on >>>>>>>> target pipe. >>>>>>>> >>>>>>>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip >>>>>>>> into hw, but too late for current vblank. >>>>>>>> >>>>>>>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete >>>>>>>> in current vblank due to missing the double-buffering deadline >>>>>>>> by a tiny bit. >>>>>>>> >>>>>>>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires, >>>>>>>> dm_dcn_crtc_high_irq() gets called. >>>>>>>> >>>>>>>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the >>>>>>>> pageflip has been completed/will complete in this vblank and >>>>>>>> sends out pageflip completion event to userspace and resets >>>>>>>> pflip_status = AMDGPU_FLIP_NONE. >>>>>>>> >>>>>>>> => Flip completion event sent out one vblank too early. >>>>>>>> >>>>>>>> This behaviour has been observed during my testing with measurement >>>>>>>> hardware a couple of time. >>>>>>>> >>>>>>>> The commit message says that the extra flip event code was added to >>>>>>>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip >>>>>>>> events >>>>>>>> in case the pflip irq doesn't fire, because the "DCH HUBP" >>>>>>>> component >>>>>>>> is clock gated and doesn't fire pflip irqs in that state. Also that >>>>>>>> this clock gating may happen if no planes are active. According to >>>>>>>> Nicholas, the clock gating can also happen if psr is active, and >>>>>>>> the >>>>>>>> gating is controlled independently by the hardware, so difficult to >>>>>>>> detect if and when the completion code in above commit is needed. >>>>>>>> >>>>>>>> This patch tries the following solution: It only executes the extra >>>>>>>> pflip >>>>>>>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports >>>>>>>> that there aren't any surface updated pending in the >>>>>>>> double-buffered >>>>>>>> surface scanout address registers. Otherwise it leaves pflip >>>>>>>> completion >>>>>>>> to the pflip irq handler, for a more race-free experience. >>>>>>>> >>>>>>>> This would only guard against the order of events mentioned above. >>>>>>>> If Step 5 (VSTARTUP trigger) happens before step 4 then this >>>>>>>> won't help >>>>>>>> at all, because 1-3 + 5 might happen even without the hw being >>>>>>>> programmed >>>>>>>> at all, ie. no surface update pending because none yet programmed >>>>>>>> into hw. >>>>>>>> >>>>>>>> Therefore this patch also changes locking in >>>>>>>> amdgpu_dm_commit_planes(), >>>>>>>> so that prepare_flip_isr() and dc_commit_updates_for_stream() >>>>>>>> are done >>>>>>>> under event_lock protection within the same critical section. >>>>>>>> >>>>>>>> v2: Take Nicholas comments into account, try a different solution. >>>>>>>> >>>>>>>> Lightly tested on Polaris (locking) and Raven (the whole DCN >>>>>>>> stuff). >>>>>>>> Seems to work without causing obvious new trouble. >>>>>>> >>>>>>> Nick, any comments on this? Can we get this committed or do you >>>>>>> think >>>>>>> it needs additional rework? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Alex >>>>>> >>>>>> Hi Alex, Mario, >>>>>> >>>>>> This might be a little strange, but if we want to get this in as a >>>>>> fix >>>>>> for regressions caused by the original vblank and user events at >>>>>> vstartup patch then I'm actually going to give my reviewed by on the >>>>>> *v1* of this patch (but not this v2): >>>>>> >>>>>> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx> >>>>>> >>>>>> You can feel free to apply that one. >>>>>> >>>>>> Reason 1: After having thought about it some more I don't think we >>>>>> enable anything today that has hubp powered down at the same time we >>>>>> expect to be waiting for a flip - eg. DMCU powering down HUBP >>>>>> during PSR >>>>>> entry. Static screen interrupt should happen after that flip >>>>>> finishes I >>>>>> think. >>>>>> >>>>>> The CRTC can still be powered on with zero planes, and I don't >>>>>> think any >>>>>> userspace explicitly asks for vblank events in this case but it >>>>>> doesn't >>>>>> hurt to have the check. >>>>>> >>>>>> Reason 2: This new patch will need much more thorough testing from >>>>>> side >>>>>> to fully understand the consequences of locking the entire DC commit >>>>>> sequence. For just a page flip that sounds fine, but for anything >>>>>> more >>>>>> than (eg. full updates, modesets, etc) I don't think we want to be >>>>>> disabling interrupts for potentially many milliseconds. >>>>> >>>>> Ah! I was wondering where the attached splat comes from, but I think >>>>> this explains it: With this patch amdgpu_dm_commit_planes keeps the >>>>> pcrtc->dev->event_lock spinlock locked while calling >>>>> dc_commit_updates_for_stream, which ends up calling >>>>> smu_set_display_count, which tries to lock a mutex. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> amd-gfx mailing list >>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>> > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx