Re: [PATCH 6/9] drm/i915: driver based PASID handling

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Fri, 09 Oct 2015 08:24:32 +0100

On Thu, 2015-10-08 at 16:57 +0100, Chris Wilson wrote:
> Correct me if I am wrong, but it looks like i915_svm implements the
> lowlevel interface with the hardware, so by convention is intel_svm.c

I think the plan is to drop the driver-mode interface altogether and
use the OS-mode version that I posted a few hours ago. So some of your
review may not apply; other parts might.

> > +	struct task_struct *tsk;
> 
> One task? A context can be passed by the device fd to another process.
> Do we inherit the VM along with the context? I don't anything to prevent
> such.

You still get to deal with this, and yes — the assigned PASID value
(given by the intel_svm_bind_mm() call) will continue to refer to the
VM of the process which called intel_svm_bind_mm(). Even when it's used
from elsewhere.

Note that you *could* bind the PASID at the time you actually submit
the request, and in that case you could bind it to the VM of the
process which is submitting the request, rather than the process which
did the initial setup.

> > +static void gpu_mm_segv(struct task_struct *tsk, unsigned long
> > address,
> > +			int si_code)
> > +{
> > +	siginfo_t info;
> > +
> > +	/* Need specific signal info here */
> > +	info.si_signo	= SIGSEGV;
> > +	info.si_errno	= EIO;
> > +	info.si_code	= si_code;
> > +	info.si_addr	= (void __user *)address;
> > +
> > +	force_sig_info(SIGSEGV, &info, tsk);
> 
> force_sig_info() is not exported, ah you builtin i915-svm.c

I'm leaving this with you too, as discussed in a separate thread.
The OS-mode IOMMU support is just returning an appropriate 'failed'
response code to the page request, and letting the endpoint device
handle that as it sees fit.

> > +	spin_lock(&dev_priv->svm.lock);
> > +	ctx = dev_priv->svm.pasid_ctx[desc.pasid];
> > +	tsk = ctx->tsk;
> > +	mm = tsk->mm;
> > +	address = desc.addr << PAGE_SHIFT;
> > +	ringbuf = ctx->engine[RCS].ringbuf;
> > +	spin_unlock(&dev_priv->svm.lock);
> 
> All of the above can disappear at anytime after the unlock?

Answering for my own code... I don't touch any of the gfx context
stuff, obviously, and I don't even use the task. I do have a reference
on the mm and it isn't going away.

> > +
> > +	down_read_trylock(&mm->mmap_sem);
> > +	vma = find_extend_vma(mm, address);
> > +	if (!vma || address < vma->vm_start) {
> > +		DRM_ERROR("bad VMA or address out of range\n");

Um... Jesse, I used that same 'address < vma->vm_start' check in my own
version. But is that going to prevent us from allowing the GPU to grow
the stack VMA downwards? I note the AMD one does it too.

> > +/* Make sure GPU writes can't hit the mm that's about to go away
> > */
> > +static void intel_mm_release(struct mmu_notifier *mn, struct
> > mm_struct *mm)
> > +{
> > +	struct intel_mm_struct *ims = container_of(mn, struct
> > intel_mm_struct,
> > +						   notifier);
> > +	struct drm_i915_private *dev_priv = ims->dev_priv;
> > +	struct drm_device *dev = dev_priv->dev;
> > +	struct intel_context *ctx;
> > +
> > +> > 	> > /*
> > +> > 	> >  * Wait for any outstanding activity and unbind the mm.  Since
> > +> > 	> >  * each context has its own ring, we can simply wait for the ring
> > +> > 	> >  * to idle before invalidating the PASID and flushing the TLB.
> > +> > 	> >  */
> > +	mutex_lock(&dev->struct_mutex);
> > +	list_for_each_entry(ctx, &ims->context_list, mm_list) {
> > +		intel_ring_idle(ctx->engine[RCS].ringbuf->ring);
> > +	}
> > +
> > +	intel_iommu_tlb_flush(dev_priv->dev);
> > +	mutex_unlock(&dev->struct_mutex);
> 
> Erm, what! So you halt the GPU everytime? But you've already 
> invalidated the shadow PTE -- ah, invalidate-range looks to be a wip.

That's the callback for when the MM goes away — on process exit, or
when we unbind the PASID. But it's still suboptimal, and I am strongly
resisting the temptation to have a callback from the OS-mode code into
the device driver in this code path.

For the normal case of unbinding the PASID cleanly, device drivers are
expected to flush their queue of requests for a given PASID *before*
calling intel_svm_unbind_pasid(). That includes contexts which might be
stalled, waiting for a page request. So you're expected to do any ring
management in advance.

If a process exits uncleanly, exit_mm() happens before exit_files(). So
the MM has gone before the i915 driver gets to clean up. In that case,
we end up in the intel_mm_release() function and it just clears the
page tables and flushes the IOTLB. Accesses will take faults, and the
proper cleanup will happen shortly.

> > +static void intel_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
> > +> > 	> > 	> > 	> >      unsigned long address, pte_t pte)
> > +{
> > +> > 	> > struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> > +> > 	> > 	> > 	> > 	> > 	> > 	> >    notifier);
> > +> > 	> > struct drm_i915_private *dev_priv = ims->dev_priv;
> > +> > 	> > struct drm_device *dev = dev_priv->dev;
> > +
> > +> > 	> > struct intel_context *ctx;
> > +
> > +> > 	> > mutex_lock(&dev->struct_mutex);
> > +> > 	> > list_for_each_entry(ctx, &ims->context_list, mm_list)
> > +> > 	> > 	> > intel_flush_page_locked(dev, ctx->pasid, address);
> > +> > 	> > mutex_unlock(&dev->struct_mutex);
> 
> Suggests you really want a ims->spinlock for context_list instead.

We now use a single PASID for all contexts, so we don't need any list
traversal like this anyway.

> > +struct intel_mm_struct *intel_bind_mm(struct drm_device *dev,
> > +				      struct intel_context *ctx)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	struct intel_mm_struct *ims;
> > +	struct mmu_notifier *mn;
> > +	int ret;
> > +
> > +	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> > +
> > +	mn = mmu_find_ops(current->mm, &intel_mmuops);
> 
> Magic function, I am missing its definition

A previous patch in the series exports this. It basically just checks
if there is *already* a mmu_notifier registered with the same
mmu_notifier_ops structure, and returns it if so.

Because of the hairy locking rules around the mmu_notifier, I've done
this differently in my own code — using idr_for_each_entry() on the IDR
I use to allocate PASIDs. That's fine while there is a relatively low
number of PASIDs allocated (i.e. a low number of *processes* using
PASIDs, since there's one-PASID-per-process). But if we end up with
lots, that idr_for_each_entry() might become burdensome and we might
need to take another look.

> > +	if (mn) {
> > +		ims = container_of(mn, struct intel_mm_struct,
> > notifier);
> > +		kref_get(&ims->kref);
> > +		goto out;
> > +	}
> > +
> > +	ims = kzalloc(sizeof(*ims), GFP_KERNEL);
> > +	if (!ims) {
> > +		ret = -ENOMEM;
> > +		goto error;
> > +	}
> > +	INIT_LIST_HEAD(&ims->context_list);
> > +
> > +	ims->notifier.ops = &intel_mmuops;
> > +
> > +	ret = mmu_notifier_register(&ims->notifier, current->mm);
> 
> This has lock inversion between struct_mutex and mm->mmap_sem.

I think this is gone, because I no longer take my equivalent
pasid_mutex within the mmu_notifier callbacks. Unless I'm missing
something.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@xxxxxxxxx                              Intel Corporation

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx