Re: Reworking of GPU reset logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.04.2012 16:14, Jerome Glisse wrote:
2012/4/21 Christian König<deathsimple@xxxxxxxxxxx>:
On 20.04.2012 01:47, Jerome Glisse wrote:
2012/4/19 Christian König<deathsimple@xxxxxxxxxxx>:
This includes mostly fixes for multi ring lockups and GPU resets, but it
should general improve the behavior of the kernel mode driver in case
something goes badly wrong.

On the other hand it completely rewrites the IB pool and semaphore
handling, so I think there are still a couple of problems in it.

The first four patches were already send to the list, but the current set
depends on them so I resend them again.

Cheers,
Christian.
I did a quick review, it looks mostly good, but as it's sensitive code
i would like to spend sometime on
it. Probably next week. Note that i had some work on this area too, i
mostly want to drop all the debugfs
related to this and add some new more usefull (basicly something that
allow you to read all the data
needed to replay a locking up ib). I also was looking into Dave reset
thread and your solution of moving
reset in ioctl return path sounds good too but i need to convince my
self that it encompass all possible
case.

Cheers,
Jerome

After sleeping a night over it I already reworked the patch for improving
the SA performance, so please wait at least for v2 before taking a look at
it :)

Regarding the debugging of lockups I had the following on my "in mind todo"
list:
1. Rework the chip specific lockup detection code a bit more and probably
clean it up a bit.
2. Make the timeout a module parameter, cause compute task sometimes block a
ring for more than 10 seconds.
3. Keep track of the actually RPTR offset a fence is emitted to
3. Keep track of all the BOs a IB is touching.
4. Now if a lockup happens start with the last successfully signaled fence
and dump the ring content after that RPTR offset till the first not signaled
fence.
5. Then if this fence references to an IB dump it's content and the BOs it
is touching.
6. Dump everything on the ring after that fence until you reach the RPTR of
the next fence or the WPTR of the ring.
7. If there is a next fence repeat the whole thing at number 5.

If I'm not completely wrong that should give you practically every
information available, and we probably should put that behind another module
option, cause we are going to spam syslog pretty much here. Feel free to
add/modify the ideas on this list.

Christian.
What i have is similar, i am assuming only ib trigger lockup, before each ib
emit to scratch reg ib offset in sa and ib size. For each ib keep bo list. On
lockup allocate big memory to copy the whole ib and all the bo referenced
by the ib (i am using my bof format as i already have userspace tools).

Remove all the debugfs file. Just add a new one that gave you the first faulty
ib. On read of this file kernel free the memory. Kernel should also free the
memory after a while or better would be to enable the lockup copy only if
some kernel radeon option is enabled.

Just resent my current patchset to the mailing list, it's not as complete as your solution, but seems to be a step into the right direction. So please take a look at them.

Being able to generate something like a "GPU crash dump" on lockup sounds like something very valuable to me, but I'm not sure if debugfs files are the right direction to go. Maybe something more like a module parameter containing a directory, and if set we dump all informations (including bo content) available in binary form (instead of the current human readable form of the debugfs files).

Anyway, the just send patchset solves the problem I'm currently looking into, and I'm running a bit out of time (again). So I don't know if I can complete that solution....

Cheers,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux