Re: GPU lockup dumping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23.05.2012 19:02, Jerome Glisse wrote:
On Wed, May 23, 2012 at 12:41 PM, Dave Airlie<airlied@xxxxxxxxx>  wrote:
On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse<j.glisse@xxxxxxxxx>  wrote:
On Wed, May 23, 2012 at 12:08 PM, Dave Airlie<airlied@xxxxxxxxx>  wrote:
On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse<j.glisse@xxxxxxxxx>  wrote:
On Wed, May 23, 2012 at 8:34 AM, Christian König
<deathsimple@xxxxxxxxxxx>  wrote:
On 23.05.2012 11:27, Dave Airlie wrote:
On Thu, May 17, 2012 at 7:28 PM,<j.glisse@xxxxxxxxx>    wrote:
So here is improved patchset, where i splited ground work necessary
for the dumping into their own patch. The debugfs improvement could
probably be usefull to intel instead of having i915 have it's own
debugfs file stuff.

The lockup dumping public api have been move into radeon_drm.h

Stressing the fact again that dump are self contained ie they have
all the data needed to be replayed (vertex, indices, shader, texture,
...).

Would really like to get this into 3.5, the new API is pretty much
straightforward and userspace tools can easily be made to convert
it to other format. The change to the driver is self contained.
I really don't like introducing this at this stage into 3.5,

I'd really like a good review of the API and what information we provide
along with how extensible it is.

I'm still not convinced replay is what we want in the field, I know its
what
*you* want, but I think apitrace stuff in userspace pretty much covers
the replaying situation. So I'd have to look at this and see how easy
it makes disecting command streams etc.

Dave.

I agree that it might not be a good idea to push that into 3.5, since at
least I (and I also think Alex) didn't had time to look into it yet. On the
other hand the patches look quite reasonable.

But I still wanted to throw in a requirement from my day to day work, maybe
that helps finding a more general solution:
When we start to work with more parts of the chip it might be necessary to
dump everything that is currently "in the fly". For example I had a whole
bunch of problems where copying data around with a 3D Blit and then missing
a sync between this job and a job on another rings causes a "hiccup" in the
hardware.

I know that this isn't your focus and that is absolutely ok with me, cause
the format you are introducing is just used in debugfs and so not part of
any stable API (at least not in my understanding), but you should still keep
in mind that we might need to extend it into that direction in the future.

Christian.
Note that my format is also done with that in mind, it can capture ib
from all rings. The only thing i don't think worth capturing are the
ring themself because there would be no way to replay them without
adding some new special API.
I'd like to dump the rings as well, as I said I'd rather we didn't
limit this to replay, but make it useful for getting as much info as
possible out

Dave.
Ring will contains very little, like ib schedule and fence, i don't
see how useful this can be.

In case we have a bug in our ib scheduling or fencing :-0

Dave.
Well i think we have several kind of lockup, the most basic one is
userspace sending broken shader, vertex, or something in that line.
The more complex one is timing related, like a bo move or some cache
invalidation that didn't happen properly and GPU endup reading either
wrong data or old cached data. I don't see how to capture useful
information for this second case, beside doing snapshot of memory.

For multi-ring i agree that dumping the ring might prove useful spot
inter-ring semaphore deadlock, or possibly inter-ring absence of
synchronization (but that would be a bad kernel bug).

I don't think that we need the actual data from the rings neither (at least as long as we keep the radeon_ring_* debugfs files). But it would still be nice to know weather or not there was a sync between the rings. See the patches I just send to you (sorry, actually send more patches than I wanted to send), storing the new sync_seq array within the debug output should enable us to actually figure out the dependencies and order between different IBs.

Cheers,
Christian.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux