On Fri, 3 Feb 2012 12:43:25 -0200, Eugeni Dodonov <eugeni.dodonov at intel.com> wrote: > This allows to hopefully find out who was responsible for the GPU death. > We record the 1st and last process to touch each object, to keep track of > the process which created the object originally and the last process to > touch it. > > To simplify post-mortem analysis, we also search for the processes names > when gathering the i915_error_state and when peeking at the list of active > gem objects in debugfs. This is not perfect for tracking all the > processes, as they can quit or die before their batchbuffers got executed, > but having to track them during the entire object lifetime would be > excessively memcpy hungry. I think you've slightly missed here. Tracking who created a buffer is interesting and who last used it, but you really need to also track on whose behalf the request (i.e. each batch) is executing. For the goal of recording creator, you could just use: obj->creator = current ? current->pid : 0; in i915_gem_object_init with 0 as the special value for objects created by the driver outside of process context. And similarly for i915_add_request, though I'd associate those with the owner of the file_priv. The important point here is that a buffer may be associated with multiple batches submitted by one or more clients before a hang is detected, and so unless the dispatch pid is tracked you do not know who submitted the erroneous batch. (Even a batch may be submitted more than once by many clients, given sufficient pathology.) So adding the request queue to the i915_error_state would also be interesting, especially with the jiffie and ring->tail. Also note that there is no direct link between i915_gem_fault() and usage of the object, the point at which you want to add the obj->last_used_by tracking to is domain management - which catches the usage of CPU mappings as well as move-to-active. -Chris -- Chris Wilson, Intel Open Source Technology Centre