On Mon, Jan 27, 2014 at 05:03:39PM +0100, Daniel Vetter wrote: > On Mon, Jan 27, 2014 at 01:06:33PM +0200, Ville Syrjälä wrote: > > On Sun, Jan 26, 2014 at 06:29:06PM +0100, Daniel Vetter wrote: > > > On Tue, Jan 21, 2014 at 04:12:38PM +0200, ville.syrjala@xxxxxxxxxxxxxxx wrote: > > > > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > > > Add a mechanism by which we can evade the leading edge of vblank. This > > > > guarantees that no two sprite register writes will straddle on either > > > > side of the vblank start, and that means all the writes will be latched > > > > together in one atomic operation. > > > > > > > > We do the vblank evade by checking the scanline counter, and if it's too > > > > close to the start of vblank (too close has been hardcoded to 100usec > > > > for now), we will wait for the vblank start to pass. In order to > > > > eliminate random delayes from the rest of the system, we operate with > > > > interrupts disabled, except when waiting for the vblank obviously. > > > > > > > > Note that we now go digging through pipe_to_crtc_mapping[] in the > > > > vblank interrupt handler, which is a bit dangerous since we set up > > > > interrupts before the crtcs. However in this case since it's the vblank > > > > interrupt, we don't actually unmask it until some piece of code > > > > requests it. > > > > > > > > v2: preempt_check_resched() calls after local_irq_enable() (Jesse) > > > > Hook up the vblank irq stuff on BDW as well > > > > v3: Pass intel_crtc instead of drm_crtc (Daniel) > > > > Warn if crtc.mutex isn't locked (Daniel) > > > > Add an explicit compiler barrier and document the barriers (Daniel) > > > > Note the irq vs. modeset setup madness in the commit message (Daniel) > > > > > > > > Reviewed-by: Jesse Barnes <jbarnes@xxxxxxxxxxxxxxxx> > > > > Signed-off-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/i915/i915_irq.c | 29 ++++++++-- > > > > drivers/gpu/drm/i915/intel_display.c | 2 + > > > > drivers/gpu/drm/i915/intel_drv.h | 3 + > > > > drivers/gpu/drm/i915/intel_sprite.c | 103 +++++++++++++++++++++++++++++++++++ > > > > 4 files changed, 133 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > > > > index ffb56a9..8ef9edb 100644 > > > > --- a/drivers/gpu/drm/i915/i915_irq.c > > > > +++ b/drivers/gpu/drm/i915/i915_irq.c > > > > @@ -1439,6 +1439,15 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir) > > > > } > > > > } > > > > > > > > +static void intel_pipe_handle_vblank(struct drm_device *dev, enum pipe pipe) > > > > +{ > > > > + struct intel_crtc *intel_crtc = > > > > + to_intel_crtc(intel_get_crtc_for_pipe(dev, pipe)); > > > > + > > > > + intel_crtc->vbl_received = true; > > > > + wake_up(&intel_crtc->vbl_wait); > > > > +} > > > > + > > > > static irqreturn_t valleyview_irq_handler(int irq, void *arg) > > > > { > > > > struct drm_device *dev = (struct drm_device *) arg; > > > > @@ -1479,8 +1488,10 @@ static irqreturn_t valleyview_irq_handler(int irq, void *arg) > > > > spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags); > > > > > > > > for_each_pipe(pipe) { > > > > - if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS) > > > > + if (pipe_stats[pipe] & PIPE_START_VBLANK_INTERRUPT_STATUS) { > > > > drm_handle_vblank(dev, pipe); > > > > + intel_pipe_handle_vblank(dev, pipe); > > > > + } > > > > > > > > if (pipe_stats[pipe] & PLANE_FLIPDONE_INT_STATUS_VLV) { > > > > intel_prepare_page_flip(dev, pipe); > > > > @@ -1679,8 +1690,10 @@ static void ilk_display_irq_handler(struct drm_device *dev, u32 de_iir) > > > > DRM_ERROR("Poison interrupt\n"); > > > > > > > > for_each_pipe(pipe) { > > > > - if (de_iir & DE_PIPE_VBLANK(pipe)) > > > > + if (de_iir & DE_PIPE_VBLANK(pipe)) { > > > > drm_handle_vblank(dev, pipe); > > > > + intel_pipe_handle_vblank(dev, pipe); > > > > + } > > > > > > > > if (de_iir & DE_PIPE_FIFO_UNDERRUN(pipe)) > > > > if (intel_set_cpu_fifo_underrun_reporting(dev, pipe, false)) > > > > @@ -1729,8 +1742,10 @@ static void ivb_display_irq_handler(struct drm_device *dev, u32 de_iir) > > > > intel_opregion_asle_intr(dev); > > > > > > > > for_each_pipe(i) { > > > > - if (de_iir & (DE_PIPE_VBLANK_IVB(i))) > > > > + if (de_iir & (DE_PIPE_VBLANK_IVB(i))) { > > > > drm_handle_vblank(dev, i); > > > > + intel_pipe_handle_vblank(dev, i); > > > > + } > > > > > > > > /* plane/pipes map 1:1 on ilk+ */ > > > > if (de_iir & DE_PLANE_FLIP_DONE_IVB(i)) { > > > > @@ -1872,8 +1887,10 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg) > > > > continue; > > > > > > > > pipe_iir = I915_READ(GEN8_DE_PIPE_IIR(pipe)); > > > > - if (pipe_iir & GEN8_PIPE_VBLANK) > > > > + if (pipe_iir & GEN8_PIPE_VBLANK) { > > > > drm_handle_vblank(dev, pipe); > > > > + intel_pipe_handle_vblank(dev, pipe); > > > > + } > > > > > > > > if (pipe_iir & GEN8_PIPE_FLIP_DONE) { > > > > intel_prepare_page_flip(dev, pipe); > > > > @@ -3161,6 +3178,8 @@ static bool i8xx_handle_vblank(struct drm_device *dev, > > > > if (!drm_handle_vblank(dev, pipe)) > > > > return false; > > > > > > > > + intel_pipe_handle_vblank(dev, pipe); > > > > + > > > > if ((iir & flip_pending) == 0) > > > > return false; > > > > > > > > @@ -3344,6 +3363,8 @@ static bool i915_handle_vblank(struct drm_device *dev, > > > > if (!drm_handle_vblank(dev, pipe)) > > > > return false; > > > > > > > > + intel_pipe_handle_vblank(dev, pipe); > > > > + > > > > if ((iir & flip_pending) == 0) > > > > return false; > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > > > > index d1cc7ea..b39e445 100644 > > > > --- a/drivers/gpu/drm/i915/intel_display.c > > > > +++ b/drivers/gpu/drm/i915/intel_display.c > > > > @@ -10297,6 +10297,8 @@ static void intel_crtc_init(struct drm_device *dev, int pipe) > > > > intel_crtc->plane = !pipe; > > > > } > > > > > > > > + init_waitqueue_head(&intel_crtc->vbl_wait); > > > > + > > > > BUG_ON(pipe >= ARRAY_SIZE(dev_priv->plane_to_crtc_mapping) || > > > > dev_priv->plane_to_crtc_mapping[intel_crtc->plane] != NULL); > > > > dev_priv->plane_to_crtc_mapping[intel_crtc->plane] = &intel_crtc->base; > > > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h > > > > index 318bb4c..e539550 100644 > > > > --- a/drivers/gpu/drm/i915/intel_drv.h > > > > +++ b/drivers/gpu/drm/i915/intel_drv.h > > > > @@ -376,6 +376,9 @@ struct intel_crtc { > > > > /* watermarks currently being used */ > > > > struct intel_pipe_wm active; > > > > } wm; > > > > + > > > > + wait_queue_head_t vbl_wait; > > > > + bool vbl_received; > > > > }; > > > > > > > > struct intel_plane_wm_parameters { > > > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c > > > > index ed9fa7c..1ffa7ba 100644 > > > > --- a/drivers/gpu/drm/i915/intel_sprite.c > > > > +++ b/drivers/gpu/drm/i915/intel_sprite.c > > > > @@ -37,6 +37,79 @@ > > > > #include <drm/i915_drm.h> > > > > #include "i915_drv.h" > > > > > > > > +static unsigned int usecs_to_scanlines(const struct drm_display_mode *mode, unsigned int usecs) > > > > +{ > > > > + /* paranoia */ > > > > + if (!mode->crtc_htotal) > > > > + return 1; > > > > + > > > > + return DIV_ROUND_UP(usecs * mode->crtc_clock, 1000 * mode->crtc_htotal); > > > > +} > > > > + > > > > +static void intel_pipe_update_start(struct intel_crtc *crtc) > > > > +{ > > > > + struct drm_device *dev = crtc->base.dev; > > > > + const struct drm_display_mode *mode = &crtc->config.adjusted_mode; > > > > + enum pipe pipe = crtc->pipe; > > > > + /* FIXME needs to be calibrated sensibly */ > > > > + unsigned int min = mode->crtc_vblank_start - usecs_to_scanlines(mode, 100); > > > > + unsigned int max = mode->crtc_vblank_start - 1; > > > > + long timeout = msecs_to_jiffies_timeout(1); > > > > + unsigned int scanline; > > > > + > > > > + WARN_ON(!mutex_is_locked(&crtc->base.mutex)); > > > > + > > > > + if (WARN_ON(drm_vblank_get(dev, pipe))) > > > > + return; > > > > + > > > > + local_irq_disable(); > > > > + > > > > + /* > > > > + * Explicit compiler barrier is used to make sure that vbl_received > > > > + * store happens before reading the current scanline, otherwise > > > > + * something like this this might happen: > > > > + * > > > > + * 1. scanline = intel_get_crtc_scanline(); > > > > + * 2. vblank irq -> vbl_received = true > > > > + * 3. vbl_received = false > > > > + * > > > > + * Which could make us miss the fact that we already crossed into > > > > + * vblank even though the scanline indicates that we're somewhere > > > > + * just before the start of vblank. > > > > + * > > > > + * wake_up() vs. wait_event_timeout() imply the necessary memory > > > > + * barries to make sure wait_event_timeout() sees the correct > > > > + * value of vbl_received after waking up. > > > > + */ > > > > + crtc->vbl_received = false; > > > > + barrier(); > > > > + scanline = intel_get_crtc_scanline(&crtc->base); > > > > > > Ok, took a bit longer but here's it. I think we need a bit more polish > > > with this one. My thoughts about all this: > > > > > > - Definitely an s/barrier/smp_*/ required. Yeah, on x86 with the total > > > store order it's ok, but a big reason for adding barriers and comments > > > is documentation. And a big reason for that is to deter people from too > > > many lockless tricks ;-) > > > > I still don't see why you'd need an smp_ memory barrier here. It's just > > one regular store vs. mmio read ordering issue we have here. > > > > If we'd really need to worry about hardware reordering the operations, > > then it would seem that we'd need a mandatory memory barrier instead > > of an smp one. > > Hm yeah, full barrier makes more sense. I'd also throw a full barrier in > _before_ we reset vbl_received in the interrupt handling code. > > Also a style nitpick with barrier comments is to have it a both places and > reference the place of the other one. So I think we should throw in an > another comment in intel_pipe_handle_vblank. Actually two to mention that > the wake_up implicit parrier syncs with the implicit barrier in wait_event > in intel_pipe_update_start. With that I'll shut up about my nagging ;-) I still think a compiler barrier is all we need here. We only have one piece of shared data, and any ordering restriction we have is strictly within one CPU (vbl_received=false store vs. scanline read), so normal cache coherency is enough to handle the irq handler vs. waiter situation. I could be convinced to use mb() between the vbl_received store and scanline read for paranoia's sake, but adding a mb() to the irq handler I don't get at all. I suppose I could add it there just for fun, but then then the comment would have to be: /* danvet's memory barrier */ since I wouldn't know what else to write ;) -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx