Quoting Tvrtko Ursulin (2018-02-19 18:31:31) > > On 19/02/2018 14:01, Chris Wilson wrote: > > If we fail to unbind the vma (due to a signal on an active buffer that > > needs to be moved for the next execbuf), then we need to clear the > > persistent tracking state we setup for this execbuf. > > > > Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields") > > Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy* > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> # v4.14+ > > --- > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > index 51f3c32c64bf..4eb28e84fda4 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > @@ -505,6 +505,8 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma) > > list_add_tail(&vma->exec_link, &eb->unbound); > > if (drm_mm_node_allocated(&vma->node)) > > err = i915_vma_unbind(vma); > > + if (unlikely(err)) > > + vma->exec_flags = NULL; > > } > > return err; > > } > > > > I was trying to track down what actually explodes for like 15 minutes. > > My track was: > > eb_relocate -> eb_lookup_vmas fails -> eb_relocate -> eb_relocate_slow > -> eb_reset_vmas -> second pass to eb_lookup_vmas -> resets > vma->exec_flags. So no explosion. > > So in other words I've failed to find what goes wrong and under which > circumstances. The first eb_relocate calls eb_lookup_vma triggers the failure and exit from execbuf. In that path, we mark the current index as the sentinel (err_vma: eb->vma[i] = NULL) which means we do not clear the last vma when unwinding in eb_release_vmas. So the vma->exec_flags was carried over into the next execbuf call from userspace. -Chris