On 10/07/2015 09:14 AM, Daniel Vetter wrote: > On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote: >> On 10/07/2015 06:00 AM, David Woodhouse wrote: >>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote: >>>> + >>>> + ret = handle_mm_fault(mm, vma, address, >>>> + desc.wr_req ? FAULT_FLAG_WRITE : 0); >>>> + if (ret & VM_FAULT_ERROR) { >>>> + gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */ >>>> + goto out_unlock; >>>> + } >>>> + >>> >>> Hm, do you need to force the SEGV there, in what ought to be generic >>> IOMMU code? >>> >>> Can you instead just let the fault handler return an appropriate >>> failure code to the IOMMU request queue and then deal with the >>> resulting error on the i915 device side? >> >> I'm not sure if we get enough info on the i915 side to handle it >> reasonably, we'll have to test that out. > > We do know precisely which context blew up, but without the TDR work we > can't yet just kill the offender selective without affecting the other > active gpu contexts. How? The notification from the IOMMU queue is asynchronous... > But besides that I really don't see a reason why we need to kill the > process if the gpu faults. After all if a thread sigfaults then signal > goes to that thread and not some random one (or the one thread that forked > the thread that blew up). And we do have interfaces to tell userspace that > something bad happened with the gpu work it submitted. We will send a signal, just as in the thread case. That generally kills the process, but the process is free to install a handler and try to do something of course. The trouble is that a fault like this indicates a bug, just as it would in the multithreaded case (processors manipulating the address space without locking for example, or a use after free, or a simple bad pointer reference). > Chris made a similar patch for userptr and I didn't like that one either. > Worst case userspace has a special SEGV handler and then things really go > down badly when that handler gets triggered at an unexpected place. Not sure what you're suggesting as an alternative; just let things keep running somehow? Jesse _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx