Hi Leo and others,
sorry for the late reply. I just spent some time looking at your patches and testing them on a Raven DCN-1.
I looked at how the vstartup line is computed in the dc_bandwidth_calcs etc., and added some DRM_DEBUG statements to the dm_dcn_crtc_high_irq and dm_pflip_high_irq handlers to print the scanline at which the handlers get invoked.
From my reading and the results my understanding is that VSTARTUP always fires after end of front-porch in VRR mode, so the dm_dcn_crtc_high_irq handler will only get invoked in the vsync/back-porch area? This is good and very important, as otherwise all the vblank and timestamp calculations would often be wrong (if it ever happened inside front-porch).
Could you give me some overview of which interrupts / hw events happens when on DCN vs DCE? I intend to spend quite a bit of quality time in December playing with the freesync code in DC and see if i can hack up some proof-of-concept for precisely timed pageflips - the approach Harry suggested in his XDC2019 talk which i finally found time to watch. I think with the highly precise vblank and pageflip timestamps we should be able to implement this precisely without the need for (jittery) software timers, just some extensions to DRR hw programming and some trickery similar to what below-the-range BTR support does. That would be so cool, especially for neuro-science/vision-science/medical research applications.
My rough undestanding so far for DCN seems to be:
1. Pageflips can execute in front-porch, ie. the register double-buffering can switch there. Can they also still execute after front-porch? How far into vsync/back-porch? I assume at some point close to the end of back-porch they can't anymore, because after a flip the line buffer needs time to prefetch the new pixeldata from the new scanout buffer [a]?
2. The VSTARTUP interrupt/event in VRR mode happens somewhere programmable after end of front-porch (suggested by the bandwidth calc code), but before VUPDATE? Is VSTARTUP the last point at which double-buffering for a pageflip can happen, ie. after that the line-buffer refill for the next frame starts, ie. [a]?
3. The VUPDATE interrupt/event marks the end of vblank? And that's where double-buffering / switch of new values for the DRR registers happens? So DRR values programmed before VUPDATE will take effect after VUPDATE and thereby affect the vblank after the current one ie. after the following video frame?
Is this correct? And how does it differ from Vega/DCE-12 and older <= Polaris / <= DCE-11 ? I remember from earlier this year that BTR works much better on DCN (tested) and DCE-12 (presumably, don't have hw to test) than it does on DCE-11 and earlier. This was due to different behaviour of when the DRR programing takes effect, allowing for much quicker switching on DCN. I'd like to understand in detail how the DRR switching/latching/double-buffering differs, if one of you can enlighten me.
There's one thing about this patch though that i think is not right. The sending of pageflip completion events from within dm_dcn_crtc_high_irq() seems to be both not needed and possibly causing potentially wrong results in pageflip events/timestamps / visual glitches due to races?
Two cases:
a) If a pageflip completes in front porch and the pageflip handler dm_pflip_high_irq() executes while in front-porch, it will queue the proper pageflip event for later delivery to user space by drm_crtc_handle_vblank() which is called by dm_dcn_crtc_high_irq() already.
b) If dm_pflip_high_irq() executes after front-porch (pageflip completes in back-porch if this is possible), it will deliver the pageflip event itself after updating the vblank count and timestamps correctly via drm_crtc_accurate_vblank_count().
There isn't a need for the extra code in your patch (if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {...}) and indeed i can just comment it out and everything works fine.
I think the code could be even harmful if a pageflip gets queued into the hardware before invocation of dm_dcn_crtc_high_irq() (ie. a bit before VSTARTUP + irq handling delay), but after missing the deadline for double-buffering of the hardwares primary surface base address registers. You could end up with setting acrtc->pflip_status = AMDGPU_FLIP_SUBMITTED, missing the hw double-buffering deadline, and then dm_dcn_crtc_high_irq() would decide to send out a pageflip completion event to userspace for a flip that hasn't actually taken place in the hw in this vblank. Userspace would then misschedule its presentation due to the wrong pageflip event / timestamp and you'd end up with the previous rendered image presented one scanout cycle too long, and the current image silently dropped and never displayed!
Indeed debug output i added shows that the dm_pflip_high_irq() handler essentially turns into doing nothing with your patch applied, so pageflip completion events sent to user space no longer correspond to true hw flips.
I have some hw measuring equipment to verify flip timing independent of the driver and during a few short test runs i think i observed this glitch at least once, suggesting the problem is real.
thanks,
-mario
On Tue, Nov 5, 2019 at 7:32 PM Li, Sun peng (Leo) <Sunpeng.Li@xxxxxxx> wrote:
On 2019-11-05 11:15 a.m., Kazlauskas, Nicholas wrote:
> On 2019-11-05 10:34 a.m., sunpeng.li@xxxxxxx wrote:
>> From: Leo Li <sunpeng.li@xxxxxxx>
>>
>> [Why]
>>
>> For DCN hardware, the crtc_high_irq handler is assigned to the vstartup
>> interrupt. This is different from DCE, which has it assigned to vblank
>> start.
>>
>> We'd like to send vblank and user events at vstartup because:
>>
>> * It happens close enough to vupdate - the point of no return for HW.
>>
>> * It is programmed as lines relative to vblank end - i.e. it is not in
>> the variable portion when VRR is enabled. We should signal user
>> events here.
>>
>> * The pflip interrupt responsible for sending user events today only
>> fires if the DCH HUBP component is not clock gated. In situations
>> where planes are disabled - but the CRTC is enabled - user events won't
>> be sent out, leading to flip done timeouts.
>>
>> Consequently, this makes vupdate on DCN hardware redundant. It will be
>> removed in the next change.
>>
>> [How]
>>
>> Add a DCN-specific crtc_high_irq handler, and hook it to the VStartup
>> signal. Inside the DCN handler, we send off user events if the pflip
>> handler hasn't already done so.
>>
>> Signed-off-by: Leo Li <sunpeng.li@xxxxxxx>
>> ---
>> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 65 ++++++++++++++++++-
>> 1 file changed, 64 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index 00017b91c91a..256a23a0ec28 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -485,6 +485,69 @@ static void dm_crtc_high_irq(void *interrupt_params)
>> }
>> }
>>
>> +
>> +/**
>> + * dm_dcn_crtc_high_irq() - Handles VStartup interrupt for DCN generation ASICs
>> + * @interrupt params - interrupt parameters
>> + *
>> + * Notify DRM's vblank event handler at VSTARTUP
>> + *
>> + * Unlike DCE hardware, we trigger the handler at VSTARTUP. at which:
>> + * * We are close enough to VUPDATE - the point of no return for hw
>> + * * We are in the fixed portion of variable front porch when vrr is enabled
>> + * * We are before VUPDATE, where double-buffered vrr registers are swapped
>> + *
>> + * It is therefore the correct place to signal vblank, send user flip events,
>> + * and update VRR.
>> + */
>> +static void dm_dcn_crtc_high_irq(void *interrupt_params)
>> +{
>> + struct common_irq_params *irq_params = interrupt_params;
>> + struct amdgpu_device *adev = irq_params->adev;
>> + struct amdgpu_crtc *acrtc;
>> + struct dm_crtc_state *acrtc_state;
>> + unsigned long flags;
>> +
>> + acrtc = get_crtc_by_otg_inst(adev, irq_params->irq_src - IRQ_TYPE_VBLANK);
>> +
>> + if (!acrtc)
>> + return;
>> +
>> + acrtc_state = to_dm_crtc_state(acrtc->base.state);
>> +
>> + DRM_DEBUG_DRIVER("crtc:%d, vupdate-vrr:%d\n", acrtc->crtc_id,
>> + amdgpu_dm_vrr_active(acrtc_state));
>> +
>> + amdgpu_dm_crtc_handle_crc_irq(&acrtc->base);
>> + drm_crtc_handle_vblank(&acrtc->base);
>
> Shouldn't this be the other way around? Don't we want the CRC sent back
> to userspace to have the updated vblank counter?
>
> This is how it worked before at least.
>
> Other than that, this patch looks fine to me.
>
> Nicholas Kazlauskas
Looks like we're doing a crtc_accurate_vblank_count() inside the crc handler,
so I don't think order matters here.
Leo
>
>> +
>> + spin_lock_irqsave(&adev->ddev->event_lock, flags)
>> +
>> + if (acrtc_state->vrr_params.supported &&
>> + acrtc_state->freesync_config.state == VRR_STATE_ACTIVE_VARIABLE) {
>> + mod_freesync_handle_v_update(
>> + adev->dm.freesync_module,
>> + acrtc_state->stream,
>> + &acrtc_state->vrr_params);
>> +
>> + dc_stream_adjust_vmin_vmax(
>> + adev->dm.dc,
>> + acrtc_state->stream,
>> + &acrtc_state->vrr_params.adjust);
>> + }
>> +
>> + if (acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED) {
>> + if (acrtc->event) {
>> + drm_crtc_send_vblank_event(&acrtc->base, acrtc->event);
>> + acrtc->event = NULL;
>> + drm_crtc_vblank_put(&acrtc->base);
>> + }
>> + acrtc->pflip_status = AMDGPU_FLIP_NONE;
>> + }
>> +
>> + spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
>> +}
>> +
>> static int dm_set_clockgating_state(void *handle,
>> enum amd_clockgating_state state)
>> {
>> @@ -2175,7 +2238,7 @@ static int dcn10_register_irq_handlers(struct amdgpu_device *adev)
>> c_irq_params->irq_src = int_params.irq_source;
>>
>> amdgpu_dm_irq_register_interrupt(adev, &int_params,
>> - dm_crtc_high_irq, c_irq_params);
>> + dm_dcn_crtc_high_irq, c_irq_params);
>> }
>>
>> /* Use VUPDATE_NO_LOCK interrupt on DCN, which seems to correspond to
>> --
>> 2.23.0
>>
>
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx