Re: Funky new vblank counter regressions in Linux 4.4-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/23/2015 09:04 PM, Harry Wentland wrote:
Hi Mario,

when we've had issues with this on amdgpu Christian fixed it by enabling
page flip irq all the time, rather than turning it on when usermode
request a flip and turning it back off after we handled it. I believe
that fix exists on radeon already. Michel should have more info on that.

See other comments inline.

Thanks,
Harry


On 2015-11-23 11:02 AM, Mario Kleiner wrote:
On 11/20/2015 04:42 AM, Alex Deucher wrote:
On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner
<mario.kleiner.de@xxxxxxxxx> wrote:
Hi Alex and Michel and Ville,

it's "fix vblank stuff" time again ;-)

Adding Harry from our display team.  He might be able to fill in the
blanks of on some of this better than I can.  It might also be worth
checking to see how our DAL (our new display code which is being
developed directly by our display team) code handles this.  It could
be that we are just missing register settings:

Thanks Alex! And hello Harry :)

http://cgit.freedesktop.org/~agd5f/linux/log/?h=DAL-wip

I'll have a look at this.

Additionally we've published full registers headers for the display
block:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/include/asic_reg/dce

The DCE8 stuff should generally apply back to DCE4.  If you have
questions about registers older asics not covered in the hw docs, let
me know.  Note the new headers are dword aligned rather than byte
aligned.


I've tested now with two different progressive modes on DCE3 and one
progressive mode on DCE4, the only cards i have atm. So far it seems
that the framecounter indeed increments when the vpos from the scanout
position query jumps back to zero. Attached for reference is my
current patch to radeon-kms. This one seems to work reliably so far,
also if i enable the immediate vblank irq disable which we so far only
used on intel-kms.

But according to this patch the framecounter increment happens
somewhere in the middle of vblank.

Essentially from the vpos extracted from
EVERGREEN_CRTC_STATUS_POSITION which defines start of vblank ("Start"
part of EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1
the framecounter stays on the count of the old previous scanout cycle.
Then when vpos wraps to zero the framecounter increments by 1. And
then we have another couple of dozen lines inside vblank until
reaching the "End" part of EVERGREEN_CRTC_V_BLANK_START_END and
entering active scanout for the new frame.

So the position of observed framecounter increment seems to be not
close to the end of vblank ("start of frame" indicator?), but a couple
of scanlines after start of vblank.

E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478,
vtotal is 1481, end of vblank is 38. So i enter the vblank and see the
old framecounter for vpos = 1478, 1479, 1480, then it wraps to 0 and
the framecounter increments by 1, then 38 scanlines later the vblank
ends.

So i seem to have something that seems to work in practice and this
"increment framecounter if vpos wraps back to zero" behavior makes
some sense. It just doesn't conform to what those descriptions for
start_line and "start of frame" indicator describe?

This is correct. Our HW doesn't really have a vblank counter but a frame
counter. The framecounter increments at the start of vsync, which is
when we wrap to zero which doesn't coincide with the start of vblank.

What we're trying to do with get_vblank_counter isn't the same as what
framecount gives us, but we could probably do something like:

if (get_scanout_pos > vblank_start)
   return frame_count + 1;
else
   return frame_count;


Great! That's what my current patch does and it seems to work well with different video modes on both DCE3 and DCE4. So theory agrees with practice again :) - thanks for clarifying this.

So the other problem we have since forever is the vblank irq firing before the start of vblank. We are typically 1-2 scanlines before vblank_start when we sample the scanout position and framecounter. This needs a slightly ugly workaround. Is there a way, maybe some config register, to fire the irq at leading edge of vblank instead of a bit too early?

thanks,
-mario

Harry

I'll test with a few more video modes.

thanks,
-mario



Ville's changes to the DRM's drm_handle_vblank() /
drm_update_vblank_count()
code in Linux 4.4 not only made that code more elegant, but also
removed the
robustness against the vblank irq quirks in AMD hw and similar
hardware. So
now i get tons of off-by-one errors and

"[   432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip
completion event has impossible msc 24803 < target_msc 24804" XOrg
messages
from that kernel.

One of the reasons for trouble is that AMD hw quirk where the hw
fires an
extra vblank irq shortly after vblank irq's get enabled, not
synchronized to
vblank, but typically in the middle of active scanout, so we get a
redundant
call to drm_handle_vblank in the middle of scanout.

To fix that i have a minor patch to make drm_update_vblank_count()
again
robust against such redundant calls, which i will send out later to the
mailing list. Diff attached for reference.

The second quirk of AMD hw is that the vblank interrupt fires a few
scanlines before start of vblank, so drm_handle_vblank ->
drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets
called
before the start of the vblank for which the new vblank count should be
queried.

The third problem is that the DRM vblank handling always had the
assumption
that hardware vblank counters would essentially increment at leading
edge of
vblank - basically in sync with the firing of the vblank irq, so
that a hw
counter readout from within the vblank irq handler would always
deliver the
new incremented value. If this assumption is violated then the
counting by
use of the hw counter gets unreliable, because depending on random
small
delays in irq handling the code may end up sampling the hw counter
pre- or
post-increment, leading to inconsistent updating and funky bugs. It
just so
happens that AMD hardware doesn't increment the hw counter at
leading edge
of vblank, so stuff falls apart.

So to fix those two problems i'm tinkering with cooking the hw vblank
counter value returned by radeon_get_vblank_counter_kms() to make it
appear
as if the counter incremented at leading edge of vblank in sync with
vblank
irq.

It almost sort of works on the rs600 code path, but i need a bit of
info
from you:

1. There's this register from the old specs for m76.pdf, which is
not part
of the current register defines for radeon-kms:

"D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"

It contains the lower 16 bits of framecounter and the 13 bits of
vertical
scanout position. It seems to give the same readings as the 24 bit
R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
would
come handy.

Does Evergreen and later have a same/similar register and where is it?

Yes.  CRTC_STATUS_VF_COUNT

CRTC:CRTC_STATUS_VF_COUNT  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32  ·
  [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2]
GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4]
GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c
DESCRIPTION: Current composite vertical and frame count for CRTC
Field Name   Bits    Default   Description
CRTC_VF_COUNT
(Access: R)    28:0    0x0    Reports current vertical and frame count


2. The hw framecounter seems to increment when the vertical scanout
position
wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i
tested so
far. Is this so on all asics? And is the hw counter increment happening
exactly at the moment that vertical scanout position jumps back to
zero, ie.
both events are driven by the same signal? Or is the framecounter
increment
just happening somewhere inside either scanline VTOTAL-1 or scanline 0?

Unless Harry knows, we can ask the hw team, but I doubt they would
have changed it.  I think it's tied to the start line which is
controlled by this register:
CRTC:CRTC_START_LINE_CONTROL  ·  [R/W]  ·  32 bits  ·  Access: 8/16/32
  ·  [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2]
GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4]
GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc
DESCRIPTION: move start_line signal earlier by 1 line in CRTC
Field Name   Bits          Default         Description
CRTC_PROGRESSIVE_START_LINE_EARLY         0    0x0    move start_line
signal by 1 line eariler in progressive mode

CRTC_INTERLACE_START_LINE_EARLY         8     0x1     move start_line
signal by 1 line earlier in interlaced timing mode

CRTC_ADVANCED_START_LINE_POSITION         19:16     0x4 Advanced
Start Line position respect to not VBLANK. Default of 4 means the
Advanced Start Line is 4 lines before the first non VBLANK line

The following info I dug up internally may be useful:

The position of the CRTC output timing generator when the “start of
frame” indicator occurs depends on several conditions.  These
conditions are whether the timing generator is outputting timing for a
progressive or interlaced display mode, whether the scaler is enabled,
and if so, whether the register to select an earlier than normal
“start of frame” indicator is set.  The “start of frame” indicator
typically occurs 2 lines before the vertical blank end position
(DxCRTC_V_BLANK_END) is reached

When interlaced output display modes are enabled
(DxCRTC_INTERLACE_ENABLE = 1) and the CRTC timing generator is enabled
(DxCRTC_MASTER_EN = 1), the timing generator’s vertical counter counts
by 2 for both the even fields and odd fields of each frame.  For both
the even fields and the odd fields of interlaced output modes, the
“start of frame” indicator occurs 2 lines before the end of the
vertical blank occurs (DxCRTC_V_BLANK_END – 4 or DxCRTC_V_BLANK_END –
3 depending on the field since the vertical counter counts by 2 in
interlaced), since the vertical counter counts by 2 in this mode).
There is one exception to the previous statement.  If scaling is
enabled (DxSCL_SCALE_EN = 1) and the early start line is enabled
(DxCRTC_INTERLACE_START_LINE_EARLY = 1) in interlaced output mode then
the “start of frame” indicator happens 3 lines before the end of the
vertical blank (DxCRTC_V_BLANK_END – 6 or DxCRTC_V_BLANK_END – 5
depending on the field since the vertical counter counts by 2 in
interlaced).

When progressive output display modes are enabled
(DxCRTC_INTERLACE_ENABLE = 0) and the CRTC timing generator is enabled
(DxCRTC_MASTER_EN = 1), the “start of frame” indicator occurs 2 lines
before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2).
Similar to interlaced output mode, there is one exception to the
previous statement.  If scaling is enabled (DxSCL_SCALE_EN = 1) and
the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY =
1) in progressive output mode then the “start of frame” indicator
happens 3 lines before the end of the vertical blank
(DxCRTC_V_BLANK_END – 3)

I hope this helps,

Alex



If we can fix this and get it into rc2 or rc3 then we could avoid a bad
regression and with a bit of luck at the same time improve by being
able to
set dev->vblank_disable_immediate = true then and allow vblank irqs
to get
turned off more aggressively for a bit of extra power saving.

thanks,
-mario

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux