Em Qua, 2015-10-21 às 09:24 +0200, Daniel Vetter escreveu: > On Wed, Oct 21, 2015 at 10:20:55AM +0300, Ville Syrjälä wrote: > > On Wed, Oct 21, 2015 at 09:11:08AM +0200, Daniel Vetter wrote: > > > On Tue, Oct 20, 2015 at 11:50:01AM -0200, Paulo Zanoni wrote: > > > > One of the problems with the current code is that it frees the > > > > CFB and > > > > releases its drm_mm node as soon as we flip FBC's enable bit. > > > > This is > > > > bad because after we disbale FBC the hardware may still use the > > > > CFB > > > > for the rest of the frame, so in theory we should only release > > > > the > > > > drm_mm node one frame after we disable FBC. Otherwise, a stolen > > > > memory > > > > allocation done right after an FBC disable may result in either > > > > corrupted memory for the new owner of that memory region or > > > > corrupted > > > > screen/underruns in case the new owner changes it while the > > > > hardware > > > > is still reading it. This case is not exactly easy to reproduce > > > > since > > > > we currently don't do a lot of stolen memory allocations, but I > > > > see > > > > patches on the mailing list trying to expose stolen memory to > > > > user > > > > space, so races will be possible. > > > > > > > > I thought about three different approaches to solve this, and > > > > they all > > > > have downsides. > > > > > > > > The first approach would be to simply use multiple drm_mm nodes > > > > and > > > > freeing the unused ones only after a frame has passed. The > > > > problem > > > > with this approach is that since stolen memory is rather small, > > > > there's a risk we just won't be able to allocate a new CFB from > > > > stolen > > > > if the previous one was not freed yet. This could happen in > > > > case we > > > > quickly disable FBC from pipe A and decide to enable it on pipe > > > > B, or > > > > just if we change pipe A's fb stride while FBC is enabled. > > > > > > > > The second approach would be similar to the first one, but > > > > maintaining > > > > a single drm_mm node and keeping track of when it can be > > > > reused. This > > > > would remove the disadvantage of not having enough space for > > > > two > > > > nodes, but would create the new problem where we may not be > > > > able to > > > > enable FBC at the point intel_fbc_update() is called, so we > > > > would have > > > > to add more code to retry updating FBC after the time has > > > > passed. And > > > > that can quickly get too complex since we can get invalidate, > > > > flush, > > > > flip_prepare, disable and other calls in the middle of the > > > > wait. > > > > > > > > Both solutions above - and also the current code - have the > > > > problem > > > > that we unnecessarily free+realloc FBC during invalidate+flush > > > > operations even if the CFB size doesn't change. > > > > > > > > The third option would be to move the allocation/deallocation > > > > to > > > > enable/disable. This makes sure that the pipe is always > > > > disabled when > > > > we allocate/deallocate the CFB, so there's no risk that the FBC > > > > hardware may read or write to the memory right after it is > > > > freed from > > > > drm_mm. The downside is that it is possible for user space to > > > > change > > > > the buffer stride without triggering a disable/enable - only > > > > deactivate/activate -, so we'll have to handle this case > > > > somehow, even > > > > though it is uncommon - see igt's kms_frontbuffer_tracking > > > > test, > > > > fbc-stridechange subtest. It could be possible to implement a > > > > way to > > > > free+alloc the CFB during said stride change, but it would > > > > involve a > > > > lot of book-keeping - exactly as mentioned above - just for a > > > > rare > > > > case, so for now I'll keep it simple and just deactivate FBC. > > > > Besides, > > > > we may not even need to disable FBC since we do CFB over- > > > > allocation. > > > > > > > > Signed-off-by: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/i915/intel_fbc.c | 132 ++++++++++++++++++++--- > > > > ---------------- > > > > 1 file changed, 68 insertions(+), 64 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_fbc.c > > > > b/drivers/gpu/drm/i915/intel_fbc.c > > > > index bf855b2..48d8cfc 100644 > > > > --- a/drivers/gpu/drm/i915/intel_fbc.c > > > > +++ b/drivers/gpu/drm/i915/intel_fbc.c > > > > @@ -59,6 +59,45 @@ static unsigned int > > > > get_crtc_fence_y_offset(struct intel_crtc *crtc) > > > > return crtc->base.y - crtc->adjusted_y; > > > > } > > > > > > > > +/* > > > > + * For SKL+, the plane source size used by the hardware is > > > > based on the value we > > > > + * write to the PLANE_SIZE register. For BDW-, the hardware > > > > looks at the value > > > > + * we wrote to PIPESRC. > > > > + */ > > > > +static void intel_fbc_get_plane_source_size(struct intel_crtc > > > > *crtc, > > > > + int *width, int > > > > *height) > > > > +{ > > > > + struct intel_plane_state *plane_state = > > > > + to_intel_plane_state(crtc- > > > > >base.primary->state); > > > > + int w, h; > > > > + > > > > + if (intel_rotation_90_or_270(plane_state- > > > > >base.rotation)) { > > > > + w = drm_rect_height(&plane_state->src) >> 16; > > > > + h = drm_rect_width(&plane_state->src) >> 16; > > > > + } else { > > > > + w = drm_rect_width(&plane_state->src) >> 16; > > > > + h = drm_rect_height(&plane_state->src) >> 16; > > > > + } > > > > + > > > > + if (width) > > > > + *width = w; > > > > + if (height) > > > > + *height = h; > > > > +} > > > > + > > > > +static int intel_fbc_calculate_cfb_size(struct intel_crtc > > > > *crtc, > > > > + struct drm_framebuffer > > > > *fb) > > > > +{ > > > > + struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > + int lines; > > > > + > > > > + intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > > + if (INTEL_INFO(dev_priv)->gen >= 7) > > > > + lines = min(lines, 2048); > > > > + > > > > + return lines * fb->pitches[0]; > > > > > > Why are we even looking at fb->pitches here? I thought that for > > > fbc we > > > only compress what we actually scan out, i.e. the source > > > rectangle of the > > > plane in pixels. Maybe some rounding involved perhaps. > > > > IIRC the docs do say the hw uses the plane stride register figure > > this > > out. It's something I've been wondering for a few years now, but > > never > > actually tested to see if we can get away with less. So we have a volunteer to implement the stolen memory checker? :) > In that case a comment would be great, maybe even with the Bspec > citation. Well, the docs talk about the stride and the HW guys confirmed it, so it didn't seem to be as something that needed a comment. You can also git-blame it :) http://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915/intel _fbc.c?id=c4ffd40908c30a33291227920e921f6b45b9e8f7 > -Daniel > > > > > > > > > Or did I miss something? > > > -Daniel > > > > > > > +} > > > > + > > > > static void i8xx_fbc_deactivate(struct drm_i915_private > > > > *dev_priv) > > > > { > > > > u32 fbc_ctl; > > > > @@ -604,11 +643,17 @@ again: > > > > } > > > > } > > > > > > > > -static int intel_fbc_alloc_cfb(struct drm_i915_private > > > > *dev_priv, int size, > > > > - int fb_cpp) > > > > +static int intel_fbc_alloc_cfb(struct intel_crtc *crtc) > > > > { > > > > + struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > + struct drm_framebuffer *fb = crtc->base.primary- > > > > >state->fb; > > > > struct drm_mm_node *uninitialized_var(compressed_llb); > > > > - int ret; > > > > + int size, fb_cpp, ret; > > > > + > > > > + WARN_ON(dev_priv->fbc.compressed_fb.allocated); > > > > + > > > > + size = intel_fbc_calculate_cfb_size(crtc, fb); > > > > + fb_cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > > > > > ret = find_compression_threshold(dev_priv, &dev_priv- > > > > >fbc.compressed_fb, > > > > size, fb_cpp); > > > > @@ -685,64 +730,6 @@ void intel_fbc_cleanup_cfb(struct > > > > drm_i915_private *dev_priv) > > > > mutex_unlock(&dev_priv->fbc.lock); > > > > } > > > > > > > > -/* > > > > - * For SKL+, the plane source size used by the hardware is > > > > based on the value we > > > > - * write to the PLANE_SIZE register. For BDW-, the hardware > > > > looks at the value > > > > - * we wrote to PIPESRC. > > > > - */ > > > > -static void intel_fbc_get_plane_source_size(struct intel_crtc > > > > *crtc, > > > > - int *width, int > > > > *height) > > > > -{ > > > > - struct intel_plane_state *plane_state = > > > > - to_intel_plane_state(crtc- > > > > >base.primary->state); > > > > - int w, h; > > > > - > > > > - if (intel_rotation_90_or_270(plane_state- > > > > >base.rotation)) { > > > > - w = drm_rect_height(&plane_state->src) >> 16; > > > > - h = drm_rect_width(&plane_state->src) >> 16; > > > > - } else { > > > > - w = drm_rect_width(&plane_state->src) >> 16; > > > > - h = drm_rect_height(&plane_state->src) >> 16; > > > > - } > > > > - > > > > - if (width) > > > > - *width = w; > > > > - if (height) > > > > - *height = h; > > > > -} > > > > - > > > > -static int intel_fbc_calculate_cfb_size(struct intel_crtc > > > > *crtc) > > > > -{ > > > > - struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > > - int lines; > > > > - > > > > - intel_fbc_get_plane_source_size(crtc, NULL, &lines); > > > > - if (INTEL_INFO(dev_priv)->gen >= 7) > > > > - lines = min(lines, 2048); > > > > - > > > > - return lines * fb->pitches[0]; > > > > -} > > > > - > > > > -static int intel_fbc_setup_cfb(struct intel_crtc *crtc) > > > > -{ > > > > - struct drm_i915_private *dev_priv = crtc->base.dev- > > > > >dev_private; > > > > - struct drm_framebuffer *fb = crtc->base.primary->fb; > > > > - int size, cpp; > > > > - > > > > - size = intel_fbc_calculate_cfb_size(crtc); > > > > - cpp = drm_format_plane_cpp(fb->pixel_format, 0); > > > > - > > > > - if (dev_priv->fbc.compressed_fb.allocated && > > > > - size <= dev_priv->fbc.compressed_fb.size * > > > > dev_priv->fbc.threshold) > > > > - return 0; > > > > - > > > > - /* Release any current block */ > > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > > - > > > > - return intel_fbc_alloc_cfb(dev_priv, size, cpp); > > > > -} > > > > - > > > > static bool stride_is_valid(struct drm_i915_private *dev_priv, > > > > unsigned int stride) > > > > { > > > > @@ -904,8 +891,19 @@ static void __intel_fbc_update(struct > > > > intel_crtc *crtc) > > > > goto out_disable; > > > > } > > > > > > > > - if (intel_fbc_setup_cfb(crtc)) { > > > > - set_no_fbc_reason(dev_priv, "not enough stolen > > > > memory"); > > > > + /* It is possible for the required CFB size change > > > > without a > > > > + * crtc->disable + crtc->enable since it is possible > > > > to change the > > > > + * stride without triggering a full modeset. Since we > > > > try to > > > > + * over-allocate the CFB, there's a chance we may keep > > > > FBC enabled even > > > > + * if this happens, but if we exceed the current CFB > > > > size we'll have to > > > > + * disable FBC. Notice that it would be possible to > > > > disable FBC, wait > > > > + * for a frame, free the stolen node, then try to > > > > reenable FBC in case > > > > + * we didn't get any invalidate/deactivate calls, but > > > > this would require > > > > + * a lot of tracking just for a case we expect to be > > > > uncommon, so we > > > > + * just don't have this code for now. */ > > > > + if (intel_fbc_calculate_cfb_size(crtc, fb) > > > > > + dev_priv->fbc.compressed_fb.size * dev_priv- > > > > >fbc.threshold) { > > > > + set_no_fbc_reason(dev_priv, "CFB requirements > > > > changed"); > > > > goto out_disable; > > > > } > > > > > > > > @@ -958,7 +956,6 @@ out_disable: > > > > DRM_DEBUG_KMS("unsupported config, > > > > deactivating FBC\n"); > > > > __intel_fbc_deactivate(dev_priv); > > > > } > > > > - __intel_fbc_cleanup_cfb(dev_priv); > > > > } > > > > > > > > /* > > > > @@ -1102,6 +1099,11 @@ void intel_fbc_enable(struct intel_crtc > > > > *crtc) > > > > goto out; > > > > } > > > > > > > > + if (intel_fbc_alloc_cfb(crtc)) { > > > > + set_no_fbc_reason(dev_priv, "not enough stolen > > > > memory"); > > > > + goto out; > > > > + } > > > > + > > > > DRM_DEBUG_KMS("Enabling FBC on pipe %c\n", > > > > pipe_name(crtc->pipe)); > > > > dev_priv->fbc.no_fbc_reason = "FBC enabled but not > > > > active yet\n"; > > > > > > > > @@ -1129,6 +1131,8 @@ static void __intel_fbc_disable(struct > > > > drm_i915_private *dev_priv) > > > > > > > > DRM_DEBUG_KMS("Disabling FBC on pipe %c\n", > > > > pipe_name(crtc->pipe)); > > > > > > > > + __intel_fbc_cleanup_cfb(dev_priv); > > > > + > > > > dev_priv->fbc.enabled = false; > > > > dev_priv->fbc.crtc = NULL; > > > > } > > > > -- > > > > 2.6.1 > > > > > > > > _______________________________________________ > > > > Intel-gfx mailing list > > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > > > -- > > > Daniel Vetter > > > Software Engineer, Intel Corporation > > > http://blog.ffwll.ch > > > _______________________________________________ > > > Intel-gfx mailing list > > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > -- > > Ville Syrjälä > > Intel OTC > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx