On Fri, Jun 07, 2019 at 02:37:07PM -0700, Ralph Campbell wrote: > > On 6/6/19 11:44 AM, Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > hmm_release() is called exactly once per hmm. ops->release() cannot > > accidentally trigger any action that would recurse back onto > > hmm->mirrors_sem. > > > > This fixes a use after-free race of the form: > > > > CPU0 CPU1 > > hmm_release() > > up_write(&hmm->mirrors_sem); > > hmm_mirror_unregister(mirror) > > down_write(&hmm->mirrors_sem); > > up_write(&hmm->mirrors_sem); > > kfree(mirror) > > mirror->ops->release(mirror) > > > > The only user we have today for ops->release is an empty function, so this > > is unambiguously safe. > > > > As a consequence of plugging this race drivers are not allowed to > > register/unregister mirrors from within a release op. > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > I agree with the analysis above but I'm not sure that release() will > always be an empty function. It might be more efficient to write back > all data migrated to a device "in one pass" instead of relying > on unmap_vmas() calling hmm_start_range_invalidate() per VMA. Sure, but it should not be allowed to recurse back to hmm_mirror_unregister. > I think the bigger issue is potential deadlocks while calling > sync_cpu_device_pagetables() and tasks calling hmm_mirror_unregister(): > > Say you have three threads: > - Thread A is in try_to_unmap(), either without holding mmap_sem or with > mmap_sem held for read. > - Thread B has some unrelated driver calling hmm_mirror_unregister(). > This doesn't require mmap_sem. > - Thread C is about to call migrate_vma(). > > Thread A Thread B Thread C > try_to_unmap hmm_mirror_unregister migrate_vma > hmm_invalidate_range_start > down_read(mirrors_sem) > down_write(mirrors_sem) > // Blocked on A > device_lock > device_lock > // Blocked on C > migrate_vma() > hmm_invalidate_range_s > down_read(mirrors_sem) > // Blocked on B > // Deadlock Oh... you know I didn't know this about rwsems in linux that they have a fairness policy for writes to block future reads.. Still, at least as things are designed, the driver cannot hold a lock it obtains under sync_cpu_device_pagetables() and nest other things in that lock. It certainly can't recurse back into any mmu notifiers while holding that lock. (as you point out) The lock in sync_cpu_device_pagetables() needs to be very narrowly focused on updating device state only. So, my first reaction is that the driver in thread C is wrong, and needs a different locking scheme. I think you'd have to make a really good case that there is no alternative for a driver.. > Perhaps we should consider using SRCU for walking the mirror->list? It means the driver has to deal with races like in this patch description. At that point there is almost no reason to insert hmm here, just use mmu notifiers directly. Drivers won't get this right, it is too hard. Jason