Re: [LSF/MM ATTEND] HMM (heterogeneous memory manager) and GPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2016-02-03 at 10:13 +0200, Oded Gabbay wrote:
> 
> > So on process exit, the MM doesn't die because the PASID binding still
> > exists. The VMA of the mmap doesn't die because the MM still exists. So
> > the underlying file remains open because the VMA still exists. And the
> > PASID binding thus doesn't die because the file is still open.
> >
> Why connect the PASID to the FD in the first place ?
> Why not tie everything to the MM ?

That's actually a question for the device driver in question, of
course; it's not the generic SVM support code which chooses *when* to
bind/unbind PASIDs. We just provide those functions for the driver to
call.

But the answer is that that's the normal resource tracking model.
Resources hang off the file and are cleared up when the file is closed.

(And exit_files() is called later than exit_mm()).

> > I've posted a patch¹ which moves us closer to the amd_iommu_v2 model,
> > although I'm still *strongly* resisting the temptation to call out into
> > device driver code from the mmu_notifier's release callback.
> 
> You mean you are resisting doing this (taken from amdkfd):
> 
> --------------
> static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
> .release = kfd_process_notifier_release,
> };
> 
> process->mmu_notifier.ops = &kfd_process_mmu_notifier_ops;
> -----------
> 
> Why, if I may ask ?

The KISS principle, especially as it relates to device drivers.
We just Do Not Want random device drivers being called in that context.

It's OK for amdkfd where you have sufficient clue to deal with it —
it's more than "just a device driver".

But when we get discrete devices with PASID support (and the required
TLP prefix support in our root ports at last!) we're going to see SVM
supported in many more device drivers, and we should make it simple.

Having the mmu_notifier release callback exposed to drivers is going to
strongly encourage them to do the WRONG thing, because they need to
interact with their hardware and *wait* for the PASID to be entirely
retired through the pipeline before they tell the IOMMU to flush it.

The patch at http://www.spinics.net/lists/linux-mm/msg100230.html
addresses this by clearing the PASID from the PASID table (in core
IOMMU code) when the process exits so that all subsequent accesses to
that PASID then take faults. The device driver can then clean up its
binding for that PASID in its own time.

It is a fairly fundamental rule that faulting access to *one* PASID
should not adversely affect behaviour for *other* PASIDs, of course.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@xxxxxxxxx                              Intel Corporation

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]