Quoting Mika Kuoppala (2017-07-19 12:51:04) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > Quoting Mika Kuoppala (2017-07-19 12:18:47) > >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > >> > >> > Workers on the i915->wq may rearm themselves so for completeness we need > >> > to replace our flush_workqueue() with a call to drain_workqueue() before > >> > unloading the device. > >> > > >> > v2: Reinforce the drain_workqueue with an preceeding rcu_barrier() as a > >> > few of the tasks that need to be drained may first be armed by RCU. > >> > > >> > References: https://bugs.freedesktop.org/show_bug.cgi?id=101627 > >> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> > Cc: Matthew Auld <matthew.auld@xxxxxxxxx> > >> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > >> > --- > >> > drivers/gpu/drm/i915/i915_drv.c | 6 ++---- > >> > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++++ > >> > drivers/gpu/drm/i915/selftests/mock_gem_device.c | 2 +- > >> > 3 files changed, 23 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > >> > index 4b62fd012877..41c5b11a7c8f 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.c > >> > +++ b/drivers/gpu/drm/i915/i915_drv.c > >> > @@ -596,7 +596,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { > >> > > >> > static void i915_gem_fini(struct drm_i915_private *dev_priv) > >> > { > >> > - flush_workqueue(dev_priv->wq); > >> > + /* Flush any outstanding unpin_work. */ > >> > + i915_gem_drain_workqueue(dev_priv); > >> > > >> > mutex_lock(&dev_priv->drm.struct_mutex); > >> > intel_uc_fini_hw(dev_priv); > >> > @@ -1409,9 +1410,6 @@ void i915_driver_unload(struct drm_device *dev) > >> > cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); > >> > i915_reset_error_state(dev_priv); > >> > > >> > - /* Flush any outstanding unpin_work. */ > >> > - drain_workqueue(dev_priv->wq); > >> > - > >> > i915_gem_fini(dev_priv); > >> > intel_uc_fini_fw(dev_priv); > >> > intel_fbc_cleanup_cfb(dev_priv); > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > >> > index 667fb5c44483..e9a4b96dc775 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.h > >> > +++ b/drivers/gpu/drm/i915/i915_drv.h > >> > @@ -3300,6 +3300,26 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) > >> > } while (flush_work(&i915->mm.free_work)); > >> > } > >> > > >> > +static inline void i915_gem_drain_workqueue(struct drm_i915_private *i915) > >> > +{ > >> > + /* > >> > + * Similar to objects above (see i915_gem_drain_freed-objects), in > >> > + * general we have workers that are armed by RCU and then rearm > >> > + * themselves in their callbacks. To be paranoid, we need to > >> > + * drain the workqueue a second time after waiting for the RCU > >> > + * grace period so that we catch work queued via RCU from the first > >> > + * pass. As neither drain_workqueue() nor flush_workqueue() report > >> > + * a result, we make an assumption that we only don't require more > >> > + * than 2 passes to catch all recursive RCU delayed work. > >> > + * > >> > + */ > >> > + int pass = 2; > >> > + do { > >> > + rcu_barrier(); > >> > + drain_workqueue(i915->wq); > >> > >> I am fine with the paranoia, and it covers the case below. Still if we do: > >> > >> drain_workqueue(); > >> rcu_barrier(); > >> > >> With drawining in progress, only chain queuing is allowed. I understand > >> this so that when it returns, all the ctx pointers are now unreferenced > >> but not freed. > >> > >> Thus the rcu_barrier() after it cleans the trash and we are good to > >> be unloaded. With one pass. > >> > >> I guess it comes to how to understand the comment, so could you > >> elaborate the 'we have workers that are armed by RCU and then rearm > >> themselves'?. As from drain_workqueue desc, this should be covered. > > > > I'm considering that they may be rearmed via RCU in the general case, > > e.g. context close frees an object and so goes onto an RCU list that > > once processed kicks off a new worker and so requires another round of > > drain_workqueue. We are in module unload so a few extra delays to belts > > and braces are ok until somebody notices it takes a few minutes to run a > > reload test ;) > > Ok. Patch is > Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> Thanks, I'm optimistic this will silence the bug, so marking it as resolved. Pushed, -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx