Sending once more as text. Am 17.10.19 um 18:26 schrieb Yang, Philip: > On 2019-10-17 4:54 a.m., Christian König wrote: >> Am 16.10.19 um 18:04 schrieb Jason Gunthorpe: >>> On Wed, Oct 16, 2019 at 10:58:02AM +0200, Christian König wrote: >>>> Am 15.10.19 um 20:12 schrieb Jason Gunthorpe: >>>>> From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> >>>>> >>>>> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, >>>>> hfi1, >>>>> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where >>>>> they only use invalidate_range_start/end and immediately check the >>>>> invalidating range against some driver data structure to tell if the >>>>> driver is interested. Half of them use an interval_tree, the others are >>>>> simple linear search lists. >>>>> >>>>> Of the ones I checked they largely seem to have various kinds of races, >>>>> bugs and poor implementation. This is a result of the complexity in how >>>>> the notifier interacts with get_user_pages(). It is extremely >>>>> difficult to >>>>> use it correctly. >>>>> >>>>> Consolidate all of this code together into the core mmu_notifier and >>>>> provide a locking scheme similar to hmm_mirror that allows the user to >>>>> safely use get_user_pages() and reliably know if the page list still >>>>> matches the mm. >>>> That sounds really good, but could you outline for a moment how that is >>>> archived? >>> It uses the same basic scheme as hmm and rdma odp, outlined in the >>> revisions to hmm.rst later on. >>> >>> Basically, >>> >>> seq = mmu_range_read_begin(&mrn); >>> >>> // This is a speculative region >>> .. get_user_pages()/hmm_range_fault() .. >> How do we enforce that this get_user_pages()/hmm_range_fault() doesn't >> see outdated page table information? >> >> In other words how the the following race prevented: >> >> CPU A CPU B >> invalidate_range_start() >> mmu_range_read_begin() >> get_user_pages()/hmm_range_fault() >> Updating the ptes >> invalidate_range_end() >> >> >> I mean get_user_pages() tries to circumvent this issue by grabbing a >> reference to the pages in question, but that isn't sufficient for the >> SVM use case. >> >> That's the reason why we had this horrible solution with a r/w lock and >> a linked list of BOs in an interval tree. >> >> Regards, >> Christian. > get_user_pages/hmm_range_fault() and invalidate_range_start() both are > called while holding mm->map_sem, so they are always serialized. Not even remotely. For calling get_user_pages()/hmm_range_fault() you only need to hold the mmap_sem in read mode. And IIRC invalidate_range_start() is sometimes called without holding the mmap_sem at all. So again how are they serialized? Regards, Christian. > > Philip >>> // Result cannot be derferenced >>> >>> take_lock(driver->update); >>> if (mmu_range_read_retry(&mrn, range.notifier_seq) { >>> // collision! The results are not correct >>> goto again >>> } >>> >>> // no collision, and now under lock. Now we can de-reference the >>> pages/etc >>> // program HW >>> // Now the invalidate callback is responsible to synchronize against >>> changes >>> unlock(driver->update) >>> >>> Basically, anything that was using hmm_mirror correctly transisions >>> over fairly trivially, just with the modification to store a sequence >>> number to close that race described in the hmm commit. >>> >>> For something like AMD gpu I expect it to transition to use dma_fence >>> from the notifier for coherency right before it unlocks driver->update. >>> >>> Jason >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx