Re: [PATCH] drm/radeon: disable any GPU activity after unrecovered lockup v5

Michel Dänzer <michel@xxxxxxxxxxx> · Thu, 28 Jun 2012 10:56:25 +0200

On Mit, 2012-06-27 at 14:14 -0400, j.glisse@xxxxxxxxx wrote: 
> From: Jerome Glisse <jglisse@xxxxxxxxxx>
> 
> After unrecovered GPU lockup avoid any GPU activities to avoid
> things like kernel segfault and alike to happen in any of the
> path that assume hw is working.
> 
> The segfault is due to PCIE vram gart table being unmapped after
> suspend in the GPU reset path. To avoid segault to happen and to
> avoid further GPU activity if unsuccessful at reseting GPU we
> use the accel_working boolean to transform ttm activities into
> noop. It does not impact the module load path because in that
> path ttm have an empty schedule queue and accel_working will be
> set to true as soon as the gart table is in valid state. Because
> ttm might have work queued it is better to use the accel working
> then disabling radeon_bo ioctl.
> 
> To trigger the segfault launch a program that repeatly create bo
> in ttm and let it run in background, then trigger gpu lockup from
> another process.
> 
> This patch also for video mode restoring on r1xx,r2xx,r3xx,r4xx,
> r5xx,rs4xx,rs6xx GPU even if GPU reset fail. When GPU reset fails
> it is very likely (so far i never had it not working) that the
> modesetting part of the GPU is still alive. So we can have a
> chance to get kernel backtrace or other debugging informations
> on the screen if we always restore the video mode.
> 
> v2: fix spelling error and disable accel before suspend and reenable
>     it after pcie gart initialization to be even more cautious about
>     possible segfault. Improve commit message
> v3: Improve commit message to describe the video mode restoring no
>     matter what.
> v4: Avoid issue after successfull GPU lockup recovery. Don't do noop
>     ttm move, instead report error if move needs bind or unbind or
>     fallback to memcpy. Don't restrict new bo domain instead refuse
>     to create new bo if gpu reset failed. Disable accel working
>     in gart vram table unpin thus we don't change the behavior of
>     the suspend path.
> v5: Avoid set domain to also trigger noop bind/unbind, instead force
>     it to wait for GPU reset to go through or return failure if
>     gpu reset fails.
> 
> cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Jerome Glisse <jglisse@xxxxxxxxxx>

[...]

> +		/* try memcpy */
> +		goto memcpy;

This comment is redundant. :)

Either way though,

Reviewed-by: Michel Dänzer <michel.daenzer@xxxxxxx>

-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel