[PATCH] Revert "drm/radeon: Try evicting from CPU accessible to inaccessible VRAM first"

julien.isorce@xxxxxxxxx (Julien Isorce) · Fri, 24 Mar 2017 09:50:36 +0000

Hi Michel,

(Just for other readers my reply has been delayed on the mailing lists and
should have been on second position)

We have actually spotted this /0/i/ but somehow I convinced myself it was
intentional. The reason I found was that you wanted to set the fpfn only if
there is 2 placements, which means it will try to move from accessible to
inaccessible.

I will have a go with that change and let you know. I do not remember if I
tried it for this soft lockup. But for sure it does not solve the hard
lockup that Zach also mentioned at the end of his reply. I am saying that
because this other issue has some similarities (same ioctl call).

But in general, isn't "radeon_lockup_timeout" supposed to detect this
situation ?

Thx
Julien

On 24 March 2017 at 09:24, Michel DÃ¤nzer <michel at daenzer.net> wrote:

> On 23/03/17 06:26 PM, Julien Isorce wrote:
> > Hi Michel,
> >
> > When it happens, the main thread of our gl based app is stuck on a
> > ioctl(RADEON_CS). I set RADEON_THREAD=false to ease the debugging but
> > same thing happens if true. Other threads are only si_shader:0,1,2,3 and
> > are doing nothing, just waiting for jobs. I can also do sudo gdb -p
> > $(pidof Xorg) to block the X11 server, to make sure there is no ping
> > pong between 2 processes. All other processes are not loading
> > dri/radeonsi_dri.so . And adding a few traces shows that the above ioctl
> > call is looping for ever on
> > https://github.com/torvalds/linux/blob/master/drivers/gpu/
> drm/ttm/ttm_bo.c#L819
> > <https://github.com/torvalds/linux/blob/master/drivers/gpu/
> drm/ttm/ttm_bo.c#L819> and
> > comes from
> > mesa https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/
> winsys/radeon/drm/radeon_drm_cs.c#n454
> > .
> >
> > After adding even more traces I can see that the bo, which is being
> > indefinitely evicted, has the flag RADEON_GEM_NO_CPU_ACCESS.
> > And it gets 3 potential placements after calling "radeon_evict_flags".
> >  1: VRAM cpu inaccessible, fpfn is 65536
> >  2: VRAM cpu accessible, fpfn is 0
> >  3: GTT, fpfn is 0
> >
> > And it looks like it continuously succeeds to move on the second
> > placement. So I might be wrong but it looks it is not even a ping pong
> > between VRAM accessible / not accessible, it just keeps being blited in
> > the CPU accessible part of the VRAM.
>
> Thanks for the detailed description! AFAICT this can only happen due to
> a silly mistake I made in this code. Does this fix it?
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/
> radeon_ttm.c
> index 5c7cf644ba1d..37d68cd1f272 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -213,8 +213,8 @@ static void radeon_evict_flags(struct
> ttm_buffer_object *bo,
>                         rbo->placement.num_busy_placement = 0;
>                         for (i = 0; i < rbo->placement.num_placement; i++)
> {
>                                 if (rbo->placements[i].flags &
> TTM_PL_FLAG_VRAM) {
> -                                       if (rbo->placements[0].fpfn < fpfn)
> -                                               rbo->placements[0].fpfn =
> fpfn;
> +                                       if (rbo->placements[i].fpfn < fpfn)
> +                                               rbo->placements[i].fpfn =
> fpfn;
>                                 } else {
>                                         rbo->placement.busy_placement =
>                                                 &rbo->placements[i];
>
>
>
> --
> Earthling Michel DÃ¤nzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170324/8b15001d/attachment-0001.html>