On Mon, Feb 06, 2012 at 11:59:11PM +0100, Eric Anholt wrote: > On Mon, 6 Feb 2012 17:15:44 +0100, Daniel Vetter <daniel at ffwll.ch> wrote: > > On Fri, Feb 03, 2012 at 06:02:38PM +0000, Chris Wilson wrote: > > > On Fri, 3 Feb 2012 12:43:25 -0200, Eugeni Dodonov <eugeni.dodonov at intel.com> wrote: > > > > This allows to hopefully find out who was responsible for the GPU death. > > > > We record the 1st and last process to touch each object, to keep track of > > > > the process which created the object originally and the last process to > > > > touch it. > > > > > > > > To simplify post-mortem analysis, we also search for the processes names > > > > when gathering the i915_error_state and when peeking at the list of active > > > > gem objects in debugfs. This is not perfect for tracking all the > > > > processes, as they can quit or die before their batchbuffers got executed, > > > > but having to track them during the entire object lifetime would be > > > > excessively memcpy hungry. > > > > > > I think you've slightly missed here. Tracking who created a buffer is > > > interesting and who last used it, but you really need to also track > > > on whose behalf the request (i.e. each batch) is executing. > > > > > > For the goal of recording creator, you could just use: > > > > > > obj->creator = current ? current->pid : 0; > > > > > > in i915_gem_object_init with 0 as the special value for objects created by > > > the driver outside of process context. And similarly for i915_add_request, > > > though I'd associate those with the owner of the file_priv. The important > > > point here is that a buffer may be associated with multiple batches > > > submitted by one or more clients before a hang is detected, and so unless > > > the dispatch pid is tracked you do not know who submitted the erroneous > > > batch. (Even a batch may be submitted more than once by many clients, > > > given sufficient pathology.) So adding the request queue to the > > > i915_error_state would also be interesting, especially with the jiffie > > > and ring->tail. > > > > > > Also note that there is no direct link between i915_gem_fault() and usage > > > of the object, the point at which you want to add the obj->last_used_by > > > tracking to is domain management - which catches the usage of CPU > > > mappings as well as move-to-active. > > > > I'll second Chris here - I think the interesting stuff is to add some kind > > of cheap ownership tracking, not who exactly created the buffer. The > > latter is imo only really interesting for resource accounting, and that > > would require it to be somewhat more solid. And we don't do any resource > > accounting atm anyway. > > Having the creator associated with the buffer should be nice. I agree > that for hang debugging, making the pid association part of the request > struct makes more sense than tracking it per-object. With those two, I > don't see much use for "last pwriter/executer" with the buffer. Could I recommend storing drm_file instead of the PID. That is what I have, and required for forced-throttling. You should be able to get to a pid from the file descriptor. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 490 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20120207/5266b58b/attachment.pgp>