On Sun, Oct 20, 2019 at 02:21:42PM +0000, Koenig, Christian wrote: > Am 18.10.19 um 22:36 schrieb Jason Gunthorpe: > > On Thu, Oct 17, 2019 at 04:47:20PM +0000, Koenig, Christian wrote: > > > >>> get_user_pages/hmm_range_fault() and invalidate_range_start() both are > >>> called while holding mm->map_sem, so they are always serialized. > >> Not even remotely. > >> > >> For calling get_user_pages()/hmm_range_fault() you only need to hold the > >> mmap_sem in read mode. > > Right > > > >> And IIRC invalidate_range_start() is sometimes called without holding > >> the mmap_sem at all. > > Yep > > > >> So again how are they serialized? > > The 'driver lock' thing does it, read the hmm documentation, the hmm > > approach is basically the only approach that was correct of all the > > drivers.. > > Well that's what I've did, but what HMM does still doesn't looks correct > to me. It has a bug, but the basic flow seems to work. https://patchwork.kernel.org/patch/11191 > > So long as the 'driver lock' is held the range cannot become > > invalidated as the 'driver lock' prevents progress of invalidation. > > Correct, but the problem is it doesn't wait for ongoing operations to > complete. > > See I'm talking about the following case: > > Thread A Thread B > invalidate_range_start() > mmu_range_read_begin() > get_user_pages()/hmm_range_fault() > grab_driver_lock() > Updating the ptes > invalidate_range_end() > > As far as I can see in invalidate_range_start() the driver lock is taken > to make sure that we can't start any invalidation while the driver is > using the pages for a command submission. Again, this uses the seqlock like scheme *and* the driver lock. In this case after grab_driver_lock() mmu_range_read_retry() will return false if Thread A has progressed to 'updating the ptes. For instance here is how the concurrency resolves for retry: CPU1 CPU2 seq = mmu_range_read_begin() invalidate_range_start() invalidate_seq++ get_user_pages()/hmm_range_fault() grab_driver_lock() ungrab_driver_lock() grab_driver_lock() seq != invalidate_seq, so retry Updating the ptes invalidate_range_end() invalidate_seq++ And here is how it resolves for blocking: CPU1 CPU2 seq = mmu_range_read_begin() invalidate_range_start() get_user_pages()/hmm_range_fault() grab_driver_lock() seq == invalidate_seq, so conitnue invalidate_seq++ ungrab_driver_lock() grab_driver_lock() // Cannot progress to 'Updating the ptes' while the drive_lock is held ungrab_driver_lock() Updating the ptes invalidate_range_end() invalidate_seq++ For the above I've simplified the mechanics of the invalidate_seq, you need to look through the patch to see how it actually works. > Well we don't update the seqlock after the update to the protected data > structure (the page table) happened, but rather before that. ??? This is what mn_itree_inv_end() does, it is called by invalidate_range_end > That doesn't looks like the normal patter for a seqlock to me and as far > as I can see that is quite a bug in the HMM design/logic. Well, hmm has a bug because it doesn't use a seqlock pattern, see the above URL. One of the motivations for this work is to squash that bug by adding a seqlock like pattern. But the basic hmm flow and collision-retry approach seems sound. Do you see a problem with this patch? Jason