On 6/6/19 11:44 AM, Jason Gunthorpe wrote:
From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.
This fixes a use after-free race of the form:
CPU0 CPU1
hmm_release()
up_write(&hmm->mirrors_sem);
hmm_mirror_unregister(mirror)
down_write(&hmm->mirrors_sem);
up_write(&hmm->mirrors_sem);
kfree(mirror)
mirror->ops->release(mirror)
The only user we have today for ops->release is an empty function, so this
is unambiguously safe.
As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.
Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
I agree with the analysis above but I'm not sure that release() will
always be an empty function. It might be more efficient to write back
all data migrated to a device "in one pass" instead of relying
on unmap_vmas() calling hmm_start_range_invalidate() per VMA.
I think the bigger issue is potential deadlocks while calling
sync_cpu_device_pagetables() and tasks calling hmm_mirror_unregister():
Say you have three threads:
- Thread A is in try_to_unmap(), either without holding mmap_sem or with
mmap_sem held for read.
- Thread B has some unrelated driver calling hmm_mirror_unregister().
This doesn't require mmap_sem.
- Thread C is about to call migrate_vma().
Thread A Thread B Thread C
try_to_unmap hmm_mirror_unregister migrate_vma
---------------------- ----------------------- ----------------------
hmm_invalidate_range_start
down_read(mirrors_sem)
down_write(mirrors_sem)
// Blocked on A
device_lock
device_lock
// Blocked on C
migrate_vma()
hmm_invalidate_range_s
down_read(mirrors_sem)
// Blocked on B
// Deadlock
Perhaps we should consider using SRCU for walking the mirror->list?
---
mm/hmm.c | 28 +++++++++-------------------
1 file changed, 9 insertions(+), 19 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index 709d138dd49027..3a45dd3d778248 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -136,26 +136,16 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
WARN_ON(!list_empty(&hmm->ranges));
mutex_unlock(&hmm->lock);
- down_write(&hmm->mirrors_sem);
- mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
- list);
- while (mirror) {
- list_del_init(&mirror->list);
- if (mirror->ops->release) {
- /*
- * Drop mirrors_sem so the release callback can wait
- * on any pending work that might itself trigger a
- * mmu_notifier callback and thus would deadlock with
- * us.
- */
- up_write(&hmm->mirrors_sem);
+ down_read(&hmm->mirrors_sem);
+ list_for_each_entry(mirror, &hmm->mirrors, list) {
+ /*
+ * Note: The driver is not allowed to trigger
+ * hmm_mirror_unregister() from this thread.
+ */
+ if (mirror->ops->release)
mirror->ops->release(mirror);
- down_write(&hmm->mirrors_sem);
- }
- mirror = list_first_entry_or_null(&hmm->mirrors,
- struct hmm_mirror, list);
}
- up_write(&hmm->mirrors_sem);
+ up_read(&hmm->mirrors_sem);
hmm_put(hmm);
}
@@ -287,7 +277,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
struct hmm *hmm = mirror->hmm;
down_write(&hmm->mirrors_sem);
- list_del_init(&mirror->list);
+ list_del(&mirror->list);
up_write(&hmm->mirrors_sem);
hmm_put(hmm);
memset(&mirror->hmm, POISON_INUSE, sizeof(mirror->hmm));