Hi Michel, (Just for other readers my reply has been delayed on the mailing lists and should have been on second position) We have actually spotted this /0/i/ but somehow I convinced myself it was intentional. The reason I found was that you wanted to set the fpfn only if there is 2 placements, which means it will try to move from accessible to inaccessible. I will have a go with that change and let you know. I do not remember if I tried it for this soft lockup. But for sure it does not solve the hard lockup that Zach also mentioned at the end of his reply. I am saying that because this other issue has some similarities (same ioctl call). But in general, isn't "radeon_lockup_timeout" supposed to detect this situation ? Thx Julien On 24 March 2017 at 09:24, Michel Dänzer <michel at daenzer.net> wrote: > On 23/03/17 06:26 PM, Julien Isorce wrote: > > Hi Michel, > > > > When it happens, the main thread of our gl based app is stuck on a > > ioctl(RADEON_CS). I set RADEON_THREAD=false to ease the debugging but > > same thing happens if true. Other threads are only si_shader:0,1,2,3 and > > are doing nothing, just waiting for jobs. I can also do sudo gdb -p > > $(pidof Xorg) to block the X11 server, to make sure there is no ping > > pong between 2 processes. All other processes are not loading > > dri/radeonsi_dri.so . And adding a few traces shows that the above ioctl > > call is looping for ever on > > https://github.com/torvalds/linux/blob/master/drivers/gpu/ > drm/ttm/ttm_bo.c#L819 > > <https://github.com/torvalds/linux/blob/master/drivers/gpu/ > drm/ttm/ttm_bo.c#L819> and > > comes from > > mesa https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/ > winsys/radeon/drm/radeon_drm_cs.c#n454 > > . > > > > After adding even more traces I can see that the bo, which is being > > indefinitely evicted, has the flag RADEON_GEM_NO_CPU_ACCESS. > > And it gets 3 potential placements after calling "radeon_evict_flags". > > 1: VRAM cpu inaccessible, fpfn is 65536 > > 2: VRAM cpu accessible, fpfn is 0 > > 3: GTT, fpfn is 0 > > > > And it looks like it continuously succeeds to move on the second > > placement. So I might be wrong but it looks it is not even a ping pong > > between VRAM accessible / not accessible, it just keeps being blited in > > the CPU accessible part of the VRAM. > > Thanks for the detailed description! AFAICT this can only happen due to > a silly mistake I made in this code. Does this fix it? > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/ > radeon_ttm.c > index 5c7cf644ba1d..37d68cd1f272 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -213,8 +213,8 @@ static void radeon_evict_flags(struct > ttm_buffer_object *bo, > rbo->placement.num_busy_placement = 0; > for (i = 0; i < rbo->placement.num_placement; i++) > { > if (rbo->placements[i].flags & > TTM_PL_FLAG_VRAM) { > - if (rbo->placements[0].fpfn < fpfn) > - rbo->placements[0].fpfn = > fpfn; > + if (rbo->placements[i].fpfn < fpfn) > + rbo->placements[i].fpfn = > fpfn; > } else { > rbo->placement.busy_placement = > &rbo->placements[i]; > > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer > _______________________________________________ > dri-devel mailing list > dri-devel at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170324/8b15001d/attachment-0001.html>