Re: [PATCH 00/11] Implement DDB algorithm and WM cleanup

Mahesh Kumar <mahesh1.kumar@xxxxxxxxx> · Fri, 12 May 2017 13:55:37 +0530

Hi Matt,

Thanks for review,

On Friday 12 May 2017 05:51 AM, Matt Roper wrote:
On Mon, May 08, 2017 at 05:18:51PM +0530, Mahesh Kumar wrote:
This series implements new DDB allocation algorithm to solve the cases,
where we have sufficient DDB available to enable multiple planes, But
due to the current algorithm not dividing it properly among planes, we
end-up failing the flip.
It also takes care of enabling same watermark level for each
plane, for efficient power saving.
Series also fixes/cleans-up few bug in present code.

There are two steps in current WM programming.

1. Calculate minimum number of blocks required  for a WM level to be
enabled. For 1440x2560 panel we need 41 blocks as minimum number of
blocks to enable WM0. This is the step which doesn't use vertical size.
It only depends on Pipe drain rate and plane horizontal size as per the
current Bspec algorithm.
So all the plane below have minimum  number of blocks required to enable
WM0 as 41
     Plane 1  - 1440x2560        -    Min blocks to enable WM0 = 41
     Plane 2  - 1440x2560        -    Min blocks to enable WM0 = 41
     Plane 3  - 1440x48          -    Min blocks to enable WM0 = 41
     Plane 4  - 1440x96          -    Min blocks to enable WM0 = 41

2. Number of blocks allotted by the driver
     Driver allocates  12 for Plane 3   &  16 for plane 4

     Total Dbuf Available = 508
     Dbuf Available after 32 blocks for cursor = 508 - (32)  = 476
Given the dbuf size of 508, I assume this example is for Broxton
hardware, right?  In that case, you wouldn't actually be able to use the
cursor plane since Plane 4 (1440x96) is mutually exclusive with the
cursor, so there wouldn't be a need to reserve these 32 blocks.   I
guess there's also the issue that the upstream driver can't actually
expose/use Plane 4 at all today.
yes, this example is for Broxton. During this writeup only optimization 
of "not to use cursor plane instead use 4th plane" was there, but code 
was still allocating DDB for cursor.
True, upstream doesn't expose 4th plane, this was as per the local 
optimization for Broxton.

Regards,
-Mahesh

That said, your overall example here still gets the important points
across and is very much appreciated.

     allocate minimum blocks for each plane 8 * 4 = 32
     remaining blocks = 476 - 32 = 444
     Relative Data Rate for Planes
        Plane 1  =  1440 * 2560 * 3  =  11059200
        Plane 2  =  1440 * 2560 * 3  =  11059200
        Plane 3  =  1440 * 48   * 3  =  207360
        Plane 4  =  1440 * 96   * 3  =  414720
        Total Relative BW            =  22740480

-   Allocate Buffer
     buffer allocation = (Plane relative data rate / total data rate)
		    * total remaming DDB + minimum plane DDB
      Plane 1  buffer allocation = (11059200 / 22740480) * 444 + 8 = 223
      Plane 2  buffer allocation = (11059200 / 22740480) * 444 + 8 = 223
      Plane 3  buffer allocation = (207360   / 22740480) * 444 + 8 = 12
      Plane 4  buffer allocation = (414720   / 22740480) * 444 + 8 = 16

In this case it forced driver to disable Plane 3 & 4. Driver need to use
more efficient way to allocate buffer that is optimum for power.

New Algorithm suggested by HW team is:

1. Calculate minimum buffer allocations for each plane and for each
     watermark level

2. Add minimum buffer allocations required for enabling WM7
     for all the planes

Level 0 =  41 + 41 + 41 + 41  = 164
Level 1 =  42 + 42 + 42 + 42  = 168
Level 2 =  42 + 42 + 42 + 42  = 168
Level 3 =  94 + 94 + 94 + 94 =  376
Level 4 =  94 + 94 + 94 + 94 =  376
Level 5 =  94 + 94 + 94 + 94 =  376
Level 6 =  94 + 94 + 94 + 94 =  376
Level 7 =  94 + 94 + 94 + 94 =  376

3. Check to see how many buffer allocation are left and enable
the best case. In this case since we have 476 blocks we can enable
WM0-7 on all 4 planes.
Let's say if we have only 200 block available then the best cases
allocation is to enable Level2 which requires 168 blocks
It's probably worth noting that the use cases that are most likely to
benefit from this are those with large differences in the height of the
'shortest' plane vs the height of the 'tallest' plane.  It's the
blind proportional distribution of remaining blocks in the current
algorithm that prevents 'short' planes from reaching their minimum block
requirements for various watermark levels (and if they can't even reach
the WM0 minimum, then the plane can't be used at all).

There will certainly still be cases where the overall display
configuration (with lots of pipes and planes in use) simply requires
more blocks than the hardware has to even reach WM0, no matter how we
slice up the limited DDB size, but the changes here will definitely help
prevent us from rejecting atomic commits for some configurations we
actually could handle.

Matt

Mahesh Kumar (11):
   drm/i915: fix naming of fixed_16_16 wrapper.
   drm/i915: Add more wrapper for fixed_point_16_16 operations
   drm/i915: Use fixed_16_16 wrapper for division operation
   drm/i915/skl+: calculate pixel_rate & relative_data_rate in fixed
     point
   drm/i915/skl: Fail the flip if no FB for WM calculation
   drm/i915/skl+: no need to memset again
   drm/i915/skl+: Fail the flip if ddb min requirement exceeds pipe
     allocation
   drm/i915/skl+: Watermark calculation cleanup
   drm/i915/skl+: use linetime latency if ddb size is not available
   drm/i915/skl: New ddb allocation algorithm
   drm/i915/skl+: consider max supported plane pixel rate while scaling

  drivers/gpu/drm/i915/i915_drv.h      |  56 +++-
  drivers/gpu/drm/i915/intel_display.c |   3 +
  drivers/gpu/drm/i915/intel_drv.h     |   2 +
  drivers/gpu/drm/i915/intel_pm.c      | 520 +++++++++++++++++++++++------------
  4 files changed, 395 insertions(+), 186 deletions(-)

--
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx