On Fri, Oct 09, 2015 at 09:28:37AM +0200, Daniel Vetter wrote: > On Thu, Oct 08, 2015 at 11:46:08PM +0100, David Woodhouse wrote: > > On Thu, 2015-10-08 at 12:29 +0100, Tomas Elf wrote: > > > > > > Could someone clarify what this means from the TDR point of view, > > > please? When you say "context blew up" I'm guessing that you mean that > > > come context caused the fault handler to get involved somehow? > > > > > > Does this imply that the offending context will hang and the driver will > > > have to detect this hang? If so, then yes - if we have the per-engine > > > hang recovery mode as part of the upcoming TDR work in place then we > > > could handle it by stepping over the offending batch buffer and moving > > > on with a minimum of side-effects on the rest of the driver/GPU. > > > > I don't think the context does hang. > > > > I've made the page-request code artificially fail and report that it > > was an invalid page fault. The gem_svm_fault test seems to complete > > (albeit complaining that the test failed). Whereas if I just don't > > service the page-request at all, *then* the GPU hang is detected. > > > > I haven't actually looked at precisely what *is* happening. > > Hm if this still works the same way as on older platforms then pagefaults > just read all 0 and writes go nowhere from the gpu. That generally also > explains ever-increasing numbers of the CS execution pointer since it's > busy churning through 48b worth of address space filled with MI_NOP. I'd > have hoped our hw would do better than that with svm ... > > If there's really no way to make it hang when we complete the fault then I > guess we'll have to hang it by not completing. Otherwise we'll have to > roll our own fault detection code right from the start. s/detection/handling/ I meant ofc. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx