Mario or Nick any thoughts? Alex On Mon, May 4, 2020 at 1:35 PM Matt Coffin <mcoffin13@xxxxxxxxx> wrote: > > Hey guys, > > This is still an issue for me, and I'm still having to run a patch to > revert this as of 5.7-rc4. To avoid breaking a lot of people's Navi > setups in 5.7, is there any news on this? Has anyone else at the very > least been able to reproduce the problem? > > It happens for me in every single program that mesa allows to utilize > variable refresh rates, and reverting it "fixes" the issue. > > Cheers, and sorry for the extra email, just making sure this is still on > someone's radar, > Matt > > On 4/14/20 5:32 PM, Matt Coffin wrote: > > Hey everyone, > > > > This patch broke variable refresh rate in games (all that I've tried so > > far... Project CARS 2, DiRT Rally 2.0, Assetto Corsa Competizione) as > > well as a simple freesync tester application. > > > > FreeSync tester I've been using: https://github.com/Nixola/VRRTest > > > > I'm not at all familiar with the page flipping code, so it would take me > > a long time to find the *right* way to fix it, but does someone else see > > why it would do that? > > > > The symptom is that the refresh rate of the display constantly bounces > > between the two ends of the FreeSync range (for me 40 -> 144), and the > > game stutters like a madman. > > > > Any help on where to start, ideas on how to fix it (other than just > > revert this commit, which I've done in the interim), or alternative > > patches would be appreciated. > > > > Thanks in advance for the work/help, > > Matt > > > > On 3/13/20 8:42 AM, Michel Dänzer wrote: > >> On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote: > >>> On 2020-03-12 10:32 a.m., Alex Deucher wrote: > >>>> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner > >>>> <mario.kleiner.de@xxxxxxxxx> wrote: > >>>>> > >>>>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user > >>>>> events at vsartup for DCN")' introduces a new way of pageflip > >>>>> completion handling for DCN, and some trouble. > >>>>> > >>>>> The current implementation introduces a race condition, which > >>>>> can cause pageflip completion events to be sent out one vblank > >>>>> too early, thereby confusing userspace and causing flicker: > >>>>> > >>>>> prepare_flip_isr(): > >>>>> > >>>>> 1. Pageflip programming takes the ddev->event_lock. > >>>>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED > >>>>> 3. Releases ddev->event_lock. > >>>>> > >>>>> --> Deadline for surface address regs double-buffering passes on > >>>>> target pipe. > >>>>> > >>>>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip > >>>>> into hw, but too late for current vblank. > >>>>> > >>>>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete > >>>>> in current vblank due to missing the double-buffering deadline > >>>>> by a tiny bit. > >>>>> > >>>>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires, > >>>>> dm_dcn_crtc_high_irq() gets called. > >>>>> > >>>>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the > >>>>> pageflip has been completed/will complete in this vblank and > >>>>> sends out pageflip completion event to userspace and resets > >>>>> pflip_status = AMDGPU_FLIP_NONE. > >>>>> > >>>>> => Flip completion event sent out one vblank too early. > >>>>> > >>>>> This behaviour has been observed during my testing with measurement > >>>>> hardware a couple of time. > >>>>> > >>>>> The commit message says that the extra flip event code was added to > >>>>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events > >>>>> in case the pflip irq doesn't fire, because the "DCH HUBP" component > >>>>> is clock gated and doesn't fire pflip irqs in that state. Also that > >>>>> this clock gating may happen if no planes are active. According to > >>>>> Nicholas, the clock gating can also happen if psr is active, and the > >>>>> gating is controlled independently by the hardware, so difficult to > >>>>> detect if and when the completion code in above commit is needed. > >>>>> > >>>>> This patch tries the following solution: It only executes the extra > >>>>> pflip > >>>>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports > >>>>> that there aren't any surface updated pending in the double-buffered > >>>>> surface scanout address registers. Otherwise it leaves pflip completion > >>>>> to the pflip irq handler, for a more race-free experience. > >>>>> > >>>>> This would only guard against the order of events mentioned above. > >>>>> If Step 5 (VSTARTUP trigger) happens before step 4 then this won't help > >>>>> at all, because 1-3 + 5 might happen even without the hw being > >>>>> programmed > >>>>> at all, ie. no surface update pending because none yet programmed > >>>>> into hw. > >>>>> > >>>>> Therefore this patch also changes locking in amdgpu_dm_commit_planes(), > >>>>> so that prepare_flip_isr() and dc_commit_updates_for_stream() are done > >>>>> under event_lock protection within the same critical section. > >>>>> > >>>>> v2: Take Nicholas comments into account, try a different solution. > >>>>> > >>>>> Lightly tested on Polaris (locking) and Raven (the whole DCN stuff). > >>>>> Seems to work without causing obvious new trouble. > >>>> > >>>> Nick, any comments on this? Can we get this committed or do you think > >>>> it needs additional rework? > >>>> > >>>> Thanks, > >>>> > >>>> Alex > >>> > >>> Hi Alex, Mario, > >>> > >>> This might be a little strange, but if we want to get this in as a fix > >>> for regressions caused by the original vblank and user events at > >>> vstartup patch then I'm actually going to give my reviewed by on the > >>> *v1* of this patch (but not this v2): > >>> > >>> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx> > >>> > >>> You can feel free to apply that one. > >>> > >>> Reason 1: After having thought about it some more I don't think we > >>> enable anything today that has hubp powered down at the same time we > >>> expect to be waiting for a flip - eg. DMCU powering down HUBP during PSR > >>> entry. Static screen interrupt should happen after that flip finishes I > >>> think. > >>> > >>> The CRTC can still be powered on with zero planes, and I don't think any > >>> userspace explicitly asks for vblank events in this case but it doesn't > >>> hurt to have the check. > >>> > >>> Reason 2: This new patch will need much more thorough testing from side > >>> to fully understand the consequences of locking the entire DC commit > >>> sequence. For just a page flip that sounds fine, but for anything more > >>> than (eg. full updates, modesets, etc) I don't think we want to be > >>> disabling interrupts for potentially many milliseconds. > >> > >> Ah! I was wondering where the attached splat comes from, but I think > >> this explains it: With this patch amdgpu_dm_commit_planes keeps the > >> pcrtc->dev->event_lock spinlock locked while calling > >> dc_commit_updates_for_stream, which ends up calling > >> smu_set_display_count, which tries to lock a mutex. > >> > >> > >> > >> _______________________________________________ > >> amd-gfx mailing list > >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx