On Wed, Sep 09, 2015 at 04:20:08PM +0100, Tvrtko Ursulin wrote: > > On 09/09/2015 04:08 PM, Chris Wilson wrote: > >On Wed, Sep 09, 2015 at 03:45:40PM +0100, Tvrtko Ursulin wrote: > >>On 08/10/2015 09:51 AM, Chris Wilson wrote: > >>>Whilst discussing possible ways to trigger an invalidate_range on a > >>>userptr with an aliased GGTT mmapping (and so cause a struct_mutex > >>>deadlock), the conclusion is that we can, and we must, prevent any > >>>possible deadlock by avoiding taking the mutex at all during > >>>invalidate_range. This has numerous advantages all of which stem from > >>>avoid the sleeping function from inside the unknown context. In > >>>particular, it simplifies the invalidate_range because we no longer > >>>have to juggle the spinlock/mutex and can just hold the spinlock > >>>for the entire walk. To compensate, we have to make get_pages a bit more > >>>complicated in order to serialise with a pending cancel_userptr worker. > >>>As we hold the struct_mutex, we have no choice but to return EAGAIN and > >>>hope that the worker is then flushed before we retry after reacquiring > >>>the struct_mutex. > >>> > >>>The important caveat is that the invalidate_range itself is no longer > >>>synchronous. There exists a small but definite period in time in which > >>>the old PTE's page remain accessible via the GPU. Note however that the > >>>physical pages themselves are not invalidated by the mmu_notifier, just > >>>the CPU view of the address space. The impact should be limited to a > >>>delay in pages being flushed, rather than a possibility of writing to > >>>the wrong pages. The only race condition that this worsens is remapping > >>>an userptr active on the GPU where fresh work may still reference the > >>>old pages due to struct_mutex contention. Given that userspace is racing > >>>with the GPU, it is fair to say that the results are undefined. > >>> > >>>v2: Only queue (and importantly only take one refcnt) the worker once. > >> > >>This one I looked at at the time of previous posting and it looked > >>fine, minus one wrong line of thinking of mine. On a brief look it > >>still looks good, so: > >> > >>Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > >>I assume Michał has run all these through the relevant test cases? > >> > >>Slightly related, I now worry about the WARN_ONs in > >>__cancel_userptr__worker since they look to be triggerable by > >>malicious userspace which is not good. > > > >They could always be I thought, if you could somehow pin the userptr > >into a hardware register and then unmap the vma. That is a scary thought > >and one I would like a WARN for. That should be the only way, and I shudder > >at the prospect of working out who to send the SIGBUS to. > > Is it not enough to submit work to the GPU and while it is running > engineer a lot of signals and munmap? No, we block signals inside the worker, which should reduce it down to EINVAL/EBUSY or EIO from unbind (and a subsequent WARN from put). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx