Hi Michel,
(Just for other readers my reply has been delayed on the mailing lists and should have been on second position)
We have actually spotted this /0/i/ but somehow I convinced myself it was intentional. The reason I found was that you wanted to set the fpfn only if there is 2 placements, which means it will try to move from accessible to inaccessible.
I will have a go with that change and let you know. I do not remember if I tried it for this soft lockup. But for sure it does not solve the hard lockup that Zach also mentioned at the end of his reply. I am saying that because this other issue has some similarities (same ioctl call).
But in general, isn't "radeon_lockup_timeout" supposed to detect this situation ?
Thx
Julien
On 24 March 2017 at 09:24, Michel Dänzer <michel@xxxxxxxxxxx> wrote:
Thanks for the detailed description! AFAICT this can only happen due toOn 23/03/17 06:26 PM, Julien Isorce wrote:
> Hi Michel,
>
> When it happens, the main thread of our gl based app is stuck on a
> ioctl(RADEON_CS). I set RADEON_THREAD=false to ease the debugging but
> same thing happens if true. Other threads are only si_shader:0,1,2,3 and
> are doing nothing, just waiting for jobs. I can also do sudo gdb -p
> $(pidof Xorg) to block the X11 server, to make sure there is no ping
> pong between 2 processes. All other processes are not loading
> dri/radeonsi_dri.so . And adding a few traces shows that the above ioctl
> call is looping for ever on
> https://github.com/torvalds/linux/blob/master/drivers/gpu/ drm/ttm/ttm_bo.c#L819
> <https://github.com/torvalds/linux/blob/master/drivers/gpu/ > anddrm/ttm/ttm_bo.c#L819
> comes from
> mesa https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/ winsys/radeon/drm/radeon_drm_ cs.c#n454
> .
>
> After adding even more traces I can see that the bo, which is being
> indefinitely evicted, has the flag RADEON_GEM_NO_CPU_ACCESS.
> And it gets 3 potential placements after calling "radeon_evict_flags".
> 1: VRAM cpu inaccessible, fpfn is 65536
> 2: VRAM cpu accessible, fpfn is 0
> 3: GTT, fpfn is 0
>
> And it looks like it continuously succeeds to move on the second
> placement. So I might be wrong but it looks it is not even a ping pong
> between VRAM accessible / not accessible, it just keeps being blited in
> the CPU accessible part of the VRAM.
a silly mistake I made in this code. Does this fix it?
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/ radeon_ttm.c
index 5c7cf644ba1d..37d68cd1f272 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -213,8 +213,8 @@ static void radeon_evict_flags(struct ttm_buffer_object *bo,
rbo->placement.num_busy_placement = 0;
for (i = 0; i < rbo->placement.num_placement; i++) {
if (rbo->placements[i].flags & TTM_PL_FLAG_VRAM) {
- if (rbo->placements[0].fpfn < fpfn)
- rbo->placements[0].fpfn = fpfn;
+ if (rbo->placements[i].fpfn < fpfn)
+ rbo->placements[i].fpfn = fpfn;
} else {
rbo->placement.busy_placement =
&rbo->placements[i];
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel