Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > When capturing the bo, we allocate an array for min(vma->size, > vma->node.size) pages, plus a bit for compression overhead. Through my > and CI testing, this was sufficient for the mostly empty NULL context as > it compressed well (or the out-of-bounds access simply didn't cause an > issue). However, in real workloads on Cannonlake, we were overflowing > that array and causing havoc with the random memory corruption. > When capturing the error object we allocate a struct for bookkeeping plus an array for min(vma->size, vma->node.size) pages and a bit for compression overhead. We use this mechanism when capturing state object by constructing a fake vma for it. We forgot to set the vma size causing allocation to cater only for bookkeepping struct, overflowing and causing havoc with the random memory corruption. This is how I see it so with above and including possible language fixes, Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Reported-by: Rafael Antognolli <rafael.antognolli@xxxxxxxxx> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103964 > Fixes: 4e90a6e22272 ("drm/i915: Record default HW state in the GPU error state") > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Tested-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gpu_error.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c > index 876be8f1d930..48418fb81066 100644 > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > @@ -1424,6 +1424,7 @@ capture_object(struct drm_i915_private *dev_priv, > if (obj && i915_gem_object_has_pages(obj)) { > struct i915_vma fake = { > .node = { .start = U64_MAX, .size = obj->base.size }, > + .size = obj->base.size, > .pages = obj->mm.pages, > .obj = obj, > }; > -- > 2.15.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx