Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.10.19 um 18:04 schrieb Jason Gunthorpe:
On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote:
Am 15.10.19 um 20:12 schrieb Jason Gunthorpe:
From: Jason Gunthorpe <jgg@xxxxxxxxxxxx>

8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
they only use invalidate_range_start/end and immediately check the
invalidating range against some driver data structure to tell if the
driver is interested. Half of them use an interval_tree, the others are
simple linear search lists.

Of the ones I checked they largely seem to have various kinds of races,
bugs and poor implementation. This is a result of the complexity in how
the notifier interacts with get_user_pages(). It is extremely difficult to
use it correctly.

Consolidate all of this code together into the core mmu_notifier and
provide a locking scheme similar to hmm_mirror that allows the user to
safely use get_user_pages() and reliably know if the page list still
matches the mm.
That sounds really good, but could you outline for a moment how that is
archived?
It uses the same basic scheme as hmm and rdma odp, outlined in the
revisions to hmm.rst later on.

Basically,

  seq = mmu_range_read_begin(&mrn);

  // This is a speculative region
  .. get_user_pages()/hmm_range_fault() ..

How do we enforce that this get_user_pages()/hmm_range_fault() doesn't see outdated page table information?

In other words how the the following race prevented:

CPU A CPU B
invalidate_range_start()
      mmu_range_read_begin()
      get_user_pages()/hmm_range_fault()
Updating the ptes
invalidate_range_end()


I mean get_user_pages() tries to circumvent this issue by grabbing a reference to the pages in question, but that isn't sufficient for the SVM use case.

That's the reason why we had this horrible solution with a r/w lock and a linked list of BOs in an interval tree.

Regards,
Christian.

  // Result cannot be derferenced

  take_lock(driver->update);
  if (mmu_range_read_retry(&mrn, range.notifier_seq) {
     // collision! The results are not correct
     goto again
  }

  // no collision, and now under lock. Now we can de-reference the pages/etc
  // program HW
  // Now the invalidate callback is responsible to synchronize against changes
  unlock(driver->update)

Basically, anything that was using hmm_mirror correctly transisions
over fairly trivially, just with the modification to store a sequence
number to close that race described in the hmm commit.

For something like AMD gpu I expect it to transition to use dma_fence
from the notifier for coherency right before it unlocks driver->update.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux