[RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark.

Mahesh Kumar <mahesh1.kumar@xxxxxxxxx> · Mon, 27 Mar 2017 21:22:45 +0530



    Arbitrated system bandwidth workarounds for watermark.

     
    All GEN-9 based platforms require watermark related WA to be enabled
    if Display memory bandwidth requirement is exceeding XX% of total
    available system memory bandwidth.

    This XX% depend on multiple factors.

    e.g. if all the enabled planes have X-tiled or linear memory
    then,

                        XX = 60

            if any Y-tiled plane is enabled then

                        XX = 20 etc.

    In current implementation of workarounds we enable maximum WA (i.e.
    add 15us latency during WM calculation) irrespective of workaround
    is required OR not. 

    total display bandwidth requirement is sum of display requirement of
    individual pipe, In order to calculate correct BW requirement plane
    configuration of any pipe should not be changing during calculation.

    
    To implement & optimize above requirement many implementations
    are possible, I'm proposing few of options.

    Please review & let know which option is better to implement
    WA's.

     
    Option 1:

    Use connection_mutex (this will change to i915 specific
      lock only that is available in atomic design) to serialize all the
      commits.

      If memory bandwidth WA is changing then get all crtc_states for
      calculating watermark values.

      Pros:

      
        In each flip optimum WM values (not more than the required
          value) will be used.
      
      Cons:

      
        This approach will serialize all the flips so there will be
          performance impact, in case of blocking commits this impact
          will be even worse, e.g. three display with refresh-rate of
          30fps, 60fps & 90fps.
        If commit is going-on in 30FPS display, all other flip will
          be blocked & frames in 60 & 90fps display will be
          dropped/blocked.
      
    
      Option 2:

    Use two levels of system bandwidth check, once during
      calculation & second during commit.

      During intel_atomic_check (as part of compute_ddb) don’t hold any
      system level mutex, instead hold WM mutex & compute system
      bandwidth requirement. If WA is changing then get crtc_state of
      all other pipes & go  ahead with commit.

      During intel_atomic_commit, again take wm_mutex & recalculate
      complete system bandwidth requirement. If requirement is changed
      in a way that computed WM are not valid anymore fail the flip.

      Update the bandwidth requirement for each plane in global state
      (dev_priv->wm) so other flips don’t need to recalculate it.

       
      Pros:

      
        It reduces critical section time.
        Still optimum use of available DDB & optimum WM values
          are used.
      
      Cons:

      
        If memory bandwidth WA are changing very frequently then
          there will be many flip failures which will impact the
          performance.
      
       
    Option 3:

    Compute maximum bandwidth requirement during modeset.

      i.e. if modeset is of 1080p @60fps & maximum plane in CRTC are
      3,  with maximum supported downscale amount “XX.YY” (defined by
      min of cdclk/crtc_clock  & max(hscale x vscale)) then max
      bandwidth requirement for CRTC will be

      (1080p x 60 x 3 x XX.YY).

       
      Now during flip if there is any change which will change the WA
      (e.g. tiling change) then take wm_mutex lock & recalculate
      complete bandwidth requirement. If WA is changing then get
      crtc_state of all other pipes & go ahead with commit. (if
      total display memory BW % is  less than lowest % to enable WA i.e.
      20%, then no need to recompute)

      Update per-CRTC bandwidth requirement in global state so other
      flips don’t need to recalculate each time.

       
      Pros:

      
        All CRTC can flip independently until there is change which
          will impact WA.
        No locking until potential WM WA change.
      
      Cons:

      
        If memory bandwidth WA is changing very frequently then
          there will be slight performance impact.
        We may not be programming optimum WM values, which may have
          some power impact.
      
      
    If you think any other approach should be used please let know
      that as well.

    
    Regards,

    -Mahesh

  
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx