On 5/22/19 6:25 PM, Jason Gunthorpe wrote:
On Wed, May 22, 2019 at 05:54:17PM -0700, Ralph Campbell wrote:
On 5/22/19 4:36 PM, Jason Gunthorpe wrote:
On Mon, May 06, 2019 at 04:35:14PM -0700, rcampbell@xxxxxxxxxx wrote:
From: Ralph Campbell <rcampbell@xxxxxxxxxx>
The last reference to struct hmm may be released long after the mm_struct
is destroyed because the struct hmm_mirror memory may be part of a
device driver open file private data pointer. The file descriptor close
is usually after the mm_struct is destroyed in do_exit(). This is a good
reason for making struct hmm a kref_t object [1] since its lifetime spans
the life time of mm_struct and struct hmm_mirror.
The fix is to not use hmm->mm in hmm_free() and to clear mm->hmm and
hmm->mm pointers in hmm_destroy() when the mm_struct is
destroyed.
I think the right way to fix this is to have the struct hmm hold a
mmgrab() on the mm so its memory cannot go away until all of the hmm
users release the struct hmm, hmm_ranges/etc
Then we can properly use mmget_not_zero() instead of the racy/abnormal
'if (hmm->xmm == NULL || hmm->dead)' pattern (see the other
thread). Actually looking at this, all these tests look very
questionable. If we hold the mmget() for the duration of the range
object, as Jerome suggested, then they all get deleted.
That just leaves mmu_notifier_unregister_no_relase() as the remaining
user of hmm->mm (everyone else is trying to do range->mm) - and it
looks like it currently tries to call
mmu_notifier_unregister_no_release on a NULL hmm->mm and crashes :(
Holding the mmgrab fixes this as we can safely call
mmu_notifier_unregister_no_relase() post exit_mmap on a grab'd mm.
Also we can delete the hmm_mm_destroy() intrustion into fork.c as it
can't be called when the mmgrab is active.
This is the basic pattern we used in ODP when working with mmu
notifiers, I don't know why hmm would need to be different.
+1 for the mmgrab() approach. I have never been able to see how these
various checks can protect anything, and refcounting it into place definitely
sounds like the right answer.
thanks,
--
John Hubbard
NVIDIA