Re: [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/10/15 18:26, Chris Wilson wrote:
On Fri, Oct 09, 2015 at 07:14:02PM +0200, Daniel Vetter wrote:
On Fri, Oct 09, 2015 at 10:03:14AM +0100, Tvrtko Ursulin wrote:

On 09/10/15 09:55, Daniel Vetter wrote:
On Fri, Oct 09, 2015 at 09:40:53AM +0100, Chris Wilson wrote:
On Fri, Oct 09, 2015 at 09:48:01AM +0200, Daniel Vetter wrote:
On Thu, Oct 08, 2015 at 10:45:47AM +0100, Tvrtko Ursulin wrote:
The concern is that this isn't how SIG_SEGV works, it's a signal the
thread who made the invalid access gets directly. You never get a SIG_SEGV
for bad access someone else has made. So essentially it's new ABI.

SIGBUS. For which the answer is yes, you can and do get SIGBUS for
actions taken by other processes.

Oh right I always forget that SIGBUS aliases with SIGIO. Anyway if
userspace wants SIGIO we just need to provide it with a pollable fd and
then it can use fcntl to make that happen. That's imo a much better api
than unconditionally throwing around signals. Also we already have the
reset stats ioctl to tell userspace that its gpu context is toats. If
anyone wants that to be pollable (or even send SIGIO) I think we should
extend that, with all the usual "needs userspace&igt" stuff on top.

I don't see that this notification can be optional. Process is confused
about its memory map use so should die. :)

This is not a GPU error/hang - this is the process doing stupid things.

MMU notifiers do not support decision making otherwise we could say
-ETXTBUSY or something on munmap, but we can't. Not even sure that it would
help in all cases, would have to fail clone as well and who knows what.

So what happens if the gpu just keeps using the memory? It'll all be
horribly undefined behaviour and eventually it'll die on an -EFAULT in
execbuf, but does anything else bad happen?

We don't see an EFAULT unless a miracle occurs, and the stale pages
continue to be read/written by other processes (as well as the client).
Horribly undefined behaviour with a misinformation leak.
-Chris

I think SIGBUS would be a good notification. It's the sort of outcome you expect when a privileged thread on the CPU or any sort of DMA-master device incurs an access fault on physical memory or I/O mapped register space. One explanation I found suggests:

	Another reason where SIGBUS can generate is explained below:

	You are currently using a external I/O device by mapping the
	device memory mapping into the system memory (Memory mapped
	I/O). You have used it. And now, you have disconnected it
	gracefully. But, somehow your code is trying to use an
	previously used address still in your code. The result in this
	case will be an SIGBUS, the reason is BUS_ADRERR, "non-existent
	physical address".

See http://cquestion.blogspot.com/2008/03/sigbus-vs-sigsegv.html

In this case, (we assume that) the GPU is going to continue to access the "physical" (PPGTT?) address of the (virtual) memory that the process is trying to revoke its access to. And while it might make sense to remove a buffer from the CPU's mapping while the GPU was still accessing it, it really makes no sense to delete a GTT mapping that the GPU may still (asynchronously) be accessing. So either we have to kill the process's outstanding tasks on the GPU (context-specific reset?) or fail the unmap (and shoot the process for trying to sabotage the GPU?).

Or ... could we decouple the pages? Duplicate them as for copy-on-write, and give one copy to the user process and the other to the GPU? Of course the actual content of the page might be indeterminate if the GPU were writing it while the CPU was taking a copy ... does this make any sense?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux