On Fri, 2 Dec 2011 23:21:49 +0100, Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > gpu reset is a very important piece of our infrastructure. > Unfortunately we only really it test by actually hanging the gpu, > which often has bad side-effects for the entire system. And the gpu > hang handling code is one of the rather complicated pieces of code we > have, consisting of > - hang detection > - error capture > - actual gpu reset > - reset of all the gem bookkeeping > - reinitialition of the entire gpu > > This patch adds a debugfs to selectively stopping rings by ceasing to > update the hw tail pointer, which will result in the gpu no longer > updating it's head pointer and eventually to the hangcheck firing. > This way we can exercise the gpu hang code under controlled conditions > without a dying gpu taking down the entire systems. > > Patch motivated by me forgetting to properly reinitialize ppgtt after > a gpu reset. > > Usage: > > echo $((1 << $ringnum)) > i915_ring_stop # stops one ring > > echo 0xffffffff > i915_ring_stop # stops all, future-proof version > > then run whatever testload is desired. i915_ring_stop automatically > resets after a gpu hang is detected to avoid hanging the gpu to fast > and declaring it wedged. > > v2: Incorporate feedback from Chris Wilson. > > v3: Add the missing cleanup. I think I've made my peace with this patch. I'm still not completely sold on its value, but if Daniel found it useful then it has merit. > > Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 65 +++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_drv.c | 2 + > drivers/gpu/drm/i915/i915_drv.h | 2 + > drivers/gpu/drm/i915/intel_ringbuffer.c | 4 ++ > 4 files changed, 73 insertions(+), 0 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index db83552..85328f7 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -1397,6 +1397,64 @@ static const struct file_operations i915_wedged_fops = { > }; > > static ssize_t > +i915_ring_stop_read(struct file *filp, > + char __user *ubuf, > + size_t max, > + loff_t *ppos) > +{ > + struct drm_device *dev = filp->private_data; > + drm_i915_private_t *dev_priv = dev->dev_private; > + char buf[80]; > + int len; > + > + len = snprintf(buf, sizeof(buf), > + "%d\n", dev_priv->stop_rings); %08x since it is a flags value, though 8 may be overkill! > + > + if (len > sizeof(buf)) > + len = sizeof(buf); > + > + return simple_read_from_buffer(ubuf, max, ppos, buf, len); > +} > + > +static ssize_t > +i915_ring_stop_write(struct file *filp, > + const char __user *ubuf, > + size_t cnt, > + loff_t *ppos) > +{ > + struct drm_device *dev = filp->private_data; > + struct drm_i915_private *dev_priv = dev->dev_private; > + char buf[20]; > + int val = 0; > + > + if (cnt > 0) { > + if (cnt > sizeof(buf) - 1) > + return -EINVAL; > + > + if (copy_from_user(buf, ubuf, cnt)) > + return -EFAULT; > + buf[cnt] = 0; > + > + val = simple_strtoul(buf, NULL, 0); > + } > + > + DRM_DEBUG_DRIVER("Stopping rings %u\n", val); %x here as well -- Chris Wilson, Intel Open Source Technology Centre