Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > The signal threads may be running concurrently with the GPU reset. The > completion from the GPU run asynchronous with the reset and two threads > may see different snapshots of the state, and the signaler may mark a > request as complete as we try to reset it. We don't tolerate 2 different > views of the same state and complain if we try to mark a request as > failed if it is already complete. Disable the signal threads during > reset to prevent this conflict (even though the conflict implies that > the state we resetting to is invalid, we have already made our > decision!). > > References: https://bugs.freedesktop.org/show_bug.cgi?id=99671 > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++++++- > drivers/gpu/drm/i915/intel_breadcrumbs.c | 3 +++ > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 3c2c2296c1dc..730804ce9610 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -35,6 +35,7 @@ > #include "intel_frontbuffer.h" > #include "intel_mocs.h" > #include <linux/dma-fence-array.h> > +#include <linux/kthread.h> > #include <linux/reservation.h> > #include <linux/shmem_fs.h> > #include <linux/slab.h> > @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv) > for_each_engine(engine, dev_priv, id) { > struct drm_i915_gem_request *request; > > + /* Prevent the signaler thread from updating the request > + * state (by calling dma_fence_signal) as we are processing > + * the reset. The write from the GPU of the seqno is > + * asynchronous and the signaler thread may see a different > + * value to us and declare the request complete, even though > + * the reset routine have picked that request as the active > + * (incomplete) request. This conflict is not handled > + * gracefully! > + */ > + kthread_park(engine->breadcrumbs.signaler); > + > /* Prevent request submission to the hardware until we have > * completed the reset in i915_gem_reset_finish(). If a request > * is completed by one engine, it may then queue a request > @@ -2796,8 +2808,10 @@ void i915_gem_reset_finish(struct drm_i915_private *dev_priv) > > lockdep_assert_held(&dev_priv->drm.struct_mutex); > > - for_each_engine(engine, dev_priv, id) > + for_each_engine(engine, dev_priv, id) { > tasklet_enable(&engine->irq_tasklet); > + kthread_unpark(engine->breadcrumbs.signaler); > + } > } > > static void nop_submit_request(struct drm_i915_gem_request *request) > diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c > index 9fd002bcebb6..f5e05343110a 100644 > --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c > +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c > @@ -490,6 +490,9 @@ static int intel_breadcrumbs_signaler(void *arg) > break; > > schedule(); > + > + if (kthread_should_park()) > + kthread_parkme(); > } > } while (1); > __set_current_state(TASK_RUNNING); > -- > 2.11.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx