Re: [PATCH] drm/amdgpu: use HMM mirror callback to replace mmu notifier v4

Philip Yang <philip.yang@xxxxxxx> · Wed, 3 Oct 2018 16:31:37 -0400



    Hi Christian,

      
      Yes, I agree. I am working on patch 2 to replace get_user_page
      with HMM. One problem is in current gfx path, we check if
      mmu_invalidation multiple times in amdgpu_cs_ioctl() path after
      get_user_page(), amdgpu_cs_parser_bos(),
      amdgpu_cs_list_validate(), and amdgpu_cs_submit(). For HMM,
      hmm_vma_range_done() has to be called once and only once after
      hmm_vma_get_pfns()/hmm_vma_fault(), so I will call
      hmm_vma_range_done() inside amdgpu_cs_submit after holding the mn
      lock. Is my understanding correct?

      
      Philip

      
      On 2018-10-02 11:05 AM, Christian König wrote:

    
      Checking more code and documentation
        and thinking about it over my vacation I think I have some new
        conclusions here.

        
        Currently we are using get_user_pages() together with an MMU
        notifier to guarantee coherent address space view, because
        get_user_pages() works by grabbing a reference to the pages and
        ignoring concurrent page table updates.

        
        But HMM uses a different approach by checking the address space
        for modifications using hmm_vma_range_done() and re-trying when
        the address space has changed.

        
        Now what you are trying to do is to change that into
        get_user_pages() and HMM callback and this is what won't work.
        We can either use get_user_pages() with MMU notifier or we can
        use HMM for the work, but we can't mix and match.

        
        So my initial guess was correct that we just need to change both
        sides of the implementation at the same time.

        
        Regards,

        Christian.

        
        Am 28.09.2018 um 17:13 schrieb Koenig, Christian:

      
        No it definitely isn't.
          

          We have literally worked month on this with
            the core MM developers.
          

          Making sure that we have a consistent page
            array is absolutely vital for correct operation.
          

          Please also check Jerome's presentation from
            XDC it also perfectly explains why this approach won't work
            correctly.
          

          Christian.
        
        
          Am 28.09.2018 17:07 schrieb "Yang,
            Philip" <Philip.Yang@xxxxxxx>:

          
          For B path, we take
            mm->mmap_sem, then call hmm_vma_get_pfns() or
            get_user_pages(). This is obvious.

            
            For A path, mmu notifier
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end()
            is called in many places, and the calling path is quit
            complicated inside mm, it's not obvious. I checked many of
            the them, for example:

            
            do_munmap()   

              down_write(&mm->mmap_sem)

              arch_unmap()

                mpx_notify_unmap()...

                   zap_bt_entries_mapping()

                     zap_page_range()

             up_write(&mm->mmap_sem)

            
            void zap_page_range(struct vm_area_struct *vma, unsigned
            long start,

                    unsigned long size)

            {

                struct mm_struct *mm = vma->vm_mm;

                struct mmu_gather tlb;

                unsigned long end = start + size;

            
                lru_add_drain();

                tlb_gather_mmu(&tlb, mm, start, end);

                update_hiwater_rss(mm);

                mmu_notifier_invalidate_range_start(mm, start, end);

                for ( ; vma && vma->vm_start < end; vma =
            vma->vm_next)

                    unmap_single_vma(&tlb, vma, start, end, NULL);

                mmu_notifier_invalidate_range_end(mm, start, end);

                tlb_finish_mmu(&tlb, start, end);

            }

             
            So AFAIK it's okay without invalidate_range_end() callback.
            

            Regards,

            Philip

            
            On 2018-09-28 01:25 AM, Koenig, Christian wrote:

          
            No, that is incorrect as well :)
              

              The mmap_sem isn't necessary taken during
                page table updates.
              

              What you could do is replace
                get_user_pages() directly with HMM. If I'm not
                completely mistaken that should work as expected.
              

              Christian.
            
            
              Am 27.09.2018 22:18 schrieb
                "Yang, Philip"  <Philip.Yang@xxxxxxx>:

              
              I was trying to understand
                the way how HMM handle this concurrent issue and how we
                handle it in amdgpu_ttm_tt_userptr_needs_pages() and 
                amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid
                flag, we use gtt->mmu_invalidations and
                gtt->last_set_pages. Both use the same lock plus flag
                idea actually.

                
                Thanks for the information, now I understand fence
                ttm_eu_fence_buffer_objects() put to BOs will block CPU
                page table update. This is another side of this
                concurrent issue I didn't know.

                
                I had same worry that it has issue without
                invalidate_range_end() callback as the calling sequence
                Felix lists. Now I think it's fine after taking a look
                again today because of mm->mmap_sem usage, this is my
                understanding:

                
                A path:

                
                down_write(&mm->mmap_sem);

                mmu_notifier_invalidate_range_start()

                    take_lock()

                    release_lock()

                CPU page table update

                mmu_notifier_invalidate_range_end()

                up_write(&mm->mmap_sem);

                
                B path:

                
                again: 

                down_read(&mm->mmap_sem);

                hmm_vma_get_pfns()

                up_read(&mm->mmap_sem);

                ....

                ....

                take_lock()

                if (!hmm_vma_range_done()) {

                   release_lock()

                   goto again

                }

                submit command job...

                release_lock()

                
                If you agree, I will submit patch v5 with some minor
                changes, and submit another patch to replace
                get_user_page() with HMM.

                 
                Regards,

                Philip  

                
                On 2018-09-27 11:36 AM, Christian König wrote:

              
                Yeah, I've read that as
                  well.

                  
                  My best guess is that we just need to add a call to
                  hmm_vma_range_done() after taking the lock and also
                  replace get_user_pages() with hmm_vma_get_pfns().

                  
                  But I'm still not 100% sure how all of that is
                  supposed to work together.

                  
                  Regards,

                  Christian.

                  
                  Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:

                
                    I
                        think the answer is here: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216
                     
                    Regards,
                     
                        Felix
                     
                    
                        From: Koenig, Christian
                            

                            Sent: Thursday, September 27, 2018
                            10:30 AM

                            To: Kuehling, Felix 
                              <Felix.Kuehling@xxxxxxx>

                            Cc: j.glisse@xxxxxxxxx;
                            Yang, Philip <Philip.Yang@xxxxxxx>;
                            amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                            Subject: RE: [PATCH] drm/amdgpu: use
                            HMM mirror callback to replace mmu notifier
                            v4
                      
                    
                      At least with
                          get_user_pages() that is perfectly possible. 
                      
                         
                        For HMM it could be
                            that this is prevented somehow.
                      
                      
                        Christian.
                      
                    
                        Am 27.09.2018 16:27
                            schrieb "Kuehling, Felix" <Felix.Kuehling@xxxxxxx>:
                      
                    
                        > In this case
                            you can end up accessing pages which are
                            invalidated while get_user_pages is in
                            process.
                         
                        What’s the sequence
                            of events you have in mind? Something like
                            this?
                         
                        
                          Page
                            table is updated and triggers MMU notifier 
                          amdgpu
                            MMU notifier runs and waits for pending CS
                            to finish while holding the read lock 
                          New
                            CS starts just after invalidate_range_start
                            MMU notifier finishes but before the page
                            table update is done 
                          get_user_pages
                            returns outdated physical addresses 
                        
                         
                        I hope that’s not
                            actually possible and that get_user_pages or
                            hmm_vma_fault would block until the page
                            table update is done. That is,
                            invalidate_range_start marks the start of a
                            page table update, and while that update is
                            in progress, get_user_pages or hmm_vma_fault
                            block. Jerome, can you comment on that?
                         
                        Thanks,

                              Felix
                         
                        
                            From: Koenig,
                                Christian 

                                Sent: Thursday, September 27,
                                2018 9:59 AM

                                To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>

                                Cc: Yang, Philip <Philip.Yang@xxxxxxx>;
                                amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
                                Jerome Glisse <j.glisse@xxxxxxxxx>

                                Subject: RE: [PATCH] drm/amdgpu:
                                use HMM mirror callback to replace mmu
                                notifier v4
                          
                        
                          Yeah I understand
                              that, but again that won't work. 
                          
                             
                            In this case
                                you can end up accessing pages which are
                                invalidated while get_user_pages is in
                                process.
                          
                          
                            Christian.
                          
                        
                            Am 27.09.2018
                                15:41 schrieb "Kuehling, Felix" <Felix.Kuehling@xxxxxxx>:
                          
                        
                            > I’m not
                                planning to change that. I don’t think
                                there is any need to change it.
                            >

                              > Yeah, but when HMM doesn't provide
                              both the start and the end hock of the
                              invalidation this way won't work any more.

                              >

                              > So we need to find a solution for
                              this,

                              > Christian.
                             
                            My whole argument is
                              that you don’t need to hold the read lock
                              until the invalidate_range_end. Just
                              read_lock and read_unlock in the
                              invalidate_range_start function.
                             
                            Regards,
                              Felix
                             
                            
                                From: Koenig,
                                    Christian 

                                    Sent: Thursday, September 27,
                                    2018 9:22 AM

                                    To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>

                                    Cc: Yang, Philip <Philip.Yang@xxxxxxx>;
                                    amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
                                    Jerome Glisse <j.glisse@xxxxxxxxx>

                                    Subject: Re: [PATCH]
                                    drm/amdgpu: use HMM mirror callback
                                    to replace mmu notifier v4
                              
                            
                              Am 27.09.2018 um
                                15:18 schrieb Kuehling, Felix:
                            
                            
                              > The problem is
                                here:

                                >
                              >
                                ttm_eu_fence_buffer_objects(&p->ticket,
                                &p->validated, p->fence);
                              >
                                amdgpu_mn_unlock(p->mn); 
                              >

                                > We need to hold the lock until the
                                fence is added to the reservation
                                object.

                                >

                                > Otherwise somebody could have
                                changed the page tables just in the
                                moment between the check of
                                amdgpu_ttm_tt_userptr_needs_pages() and
                                adding the fence to the reservation
                                object.
                              I’m not
                                  planning to change that. I don’t think
                                  there is any need to change it.
                            
                            
                              Yeah, but when HMM doesn't provide both
                              the start and the end hock of the
                              invalidation this way won't work any more.

                              
                              So we need to find a solution for this,

                              Christian.
                            
                               
                              Regards,
                                Felix
                               
                              
                                  From: Koenig,
                                      Christian 

                                      Sent: Thursday, September
                                      27, 2018 7:24 AM

                                      To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>

                                      Cc: Yang, Philip <Philip.Yang@xxxxxxx>;
                                      amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
                                      Jerome Glisse <j.glisse@xxxxxxxxx>

                                      Subject: Re: [PATCH]
                                      drm/amdgpu: use HMM mirror
                                      callback to replace mmu notifier
                                      v4
                                
                              
                                Am 27.09.2018 um
                                  13:08 schrieb Kuehling, Felix:
                              
                              
                                > We double
                                  check that there wasn't any page table
                                  modification while we prepared the
                                  submission and restart the whole
                                  process when there actually was some
                                  update.

                                  >

                                  > The reason why we need to do this
                                  is here:

                                  >
                                >       
                                  ttm_eu_fence_buffer_objects(&p->ticket,
                                  &p->validated, p->fence);

                                  >       
                                  amdgpu_mn_unlock(p->mn);
                                >

                                  > Only after the new fence is added
                                  to the buffer object we can release
                                  the lock so that any invalidation will
                                  now block on our command submission to
                                  finish before it modifies the page
                                  table.

                                  
                                I don’t see
                                    why this requires holding the
                                    read-lock until
                                    invalidate_range_end.
                                    amdgpu_ttm_tt_affect_userptr gets
                                    called while the mn read-lock is
                                    held in invalidate_range_start
                                    notifier.
                              
                              
                                That's not related to
                                amdgpu_ttm_tt_affect_userptr(), this
                                function could actually be called
                                outside the lock.

                                
                                The problem is here:
                              
                                ttm_eu_fence_buffer_objects(&p->ticket,
                                  &p->validated, p->fence);
                              
                              
                                amdgpu_mn_unlock(p->mn);
                                
                              
                                We need to hold the lock until the fence
                                is added to the reservation object.

                                
                                Otherwise somebody could have changed
                                the page tables just in the moment
                                between the check of
                                amdgpu_ttm_tt_userptr_needs_pages() and
                                adding the fence to the reservation
                                object.

                                
                                Regards,

                                Christian.

                                
                                Regards,
                                  Felix
                                 
                                
                                    From:
                                        Koenig, Christian 

                                        Sent: Thursday, September
                                        27, 2018 5:27 AM

                                        To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>

                                        Cc: Yang, Philip <Philip.Yang@xxxxxxx>;
                                        amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
                                        Jerome Glisse <j.glisse@xxxxxxxxx>

                                        Subject: Re: [PATCH]
                                        drm/amdgpu: use HMM mirror
                                        callback to replace mmu notifier
                                        v4
                                  
                                
                                  That is
                                    correct, but take a look what we do
                                    when after calling the
                                    amdgpu_mn_read_lock():

                                    
                                            /* No
                                      memory allocation is allowed while
                                      holding the mn lock */

                                              amdgpu_mn_lock(p->mn);

                                             
                                      amdgpu_bo_list_for_each_userptr_entry(e,
                                      p->bo_list) {

                                                      struct amdgpu_bo
                                      *bo =
                                      ttm_to_amdgpu_bo(e->tv.bo);

                                      
                                                      if
                                      (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm))
                                      {

                                                              r =
                                      -ERESTARTSYS;

                                                              goto
                                      error_abort;

                                                      }

                                              }
                                  
                                  
                                    We double check that there wasn't
                                    any page table modification while we
                                    prepared the submission and restart
                                    the whole process when there
                                    actually was some update.

                                    
                                    The reason why we need to do this is
                                    here:

                                    
                                      ttm_eu_fence_buffer_objects(&p->ticket,
                                      &p->validated,
                                      p->fence);

                                             
                                      amdgpu_mn_unlock(p->mn);
                                  
                                  
                                    Only after the new fence is added to
                                    the buffer object we can release the
                                    lock so that any invalidation will
                                    now block on our command submission
                                    to finish before it modifies the
                                    page table.

                                    
                                    The only other option would be to
                                    add the fence first and then check
                                    if there was any update to the page
                                    tables.

                                    
                                    The issue with that approach is that
                                    adding a fence can't be made undone,
                                    so if we find that there actually
                                    was an update to the page tables we
                                    would need to somehow turn the CS
                                    into a dummy (e.g. overwrite all IBs
                                    with NOPs or something like that)
                                    and still submit it.

                                    
                                    Not sure if that is actually
                                    possible.

                                    
                                    Regards,

                                    Christian.

                                    
                                    Am 27.09.2018 um 10:47 schrieb
                                    Kuehling, Felix:
                                
                                
                                  So back to my
                                    previous question:
                                   
                                  >> But
                                    do we really need another lock for
                                    this? Wouldn't the 
                                  >>
                                    re-validation of userptr BOs
                                    (currently calling get_user_pages)
                                    force 
                                  >>
                                    synchronization with the ongoing
                                    page table invalidation through the
                                  
                                  >>
                                    mmap_sem or other MM locks?
                                  > 
                                  > No and
                                    yes. We don't hold any other locks
                                    while doing command submission, but
                                    I expect that HMM has its own
                                    mechanism to prevent that.
                                  > 
                                  > Since we
                                    don't modify
                                    amdgpu_mn_lock()/amdgpu_mn_unlock()
                                    we are certainly not using this
                                    mechanism correctly.
                                   
                                  The existing
                                    amdgpu_mn_lock/unlock should block
                                    the MMU notifier while a command
                                    submission is in progress. It should
                                    also block command submission while
                                    an MMU notifier is in progress.
                                   
                                  What we lose with
                                    HMM is the ability to hold a
                                    read-lock for the entire duration of
                                    the invalidate_range_start until
                                    invalidate_range_end. As I
                                    understand it, that lock is meant to
                                    prevent new command submissions
                                    while the page tables are being
                                    updated by the kernel. But my point
                                    is, that get_user_pages or
                                    hmm_vma_fault should do the same
                                    kind of thing. Before the command
                                    submission can go ahead, it needs to
                                    update the userptr addresses. If the
                                    page tables are still being updated,
                                    it will block there even without
                                    holding the amdgpu_mn_read_lock.
                                   
                                  Regards,
                                    Felix
                                   
                                  
                                      From:
                                        Koenig, Christian 

                                        Sent: Thursday, September
                                        27, 2018 3:00 AM

                                        To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>

                                        Cc: Yang, Philip <Philip.Yang@xxxxxxx>;
                                        amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
                                        Jerome Glisse <j.glisse@xxxxxxxxx>

                                        Subject: RE: [PATCH]
                                        drm/amdgpu: use HMM mirror
                                        callback to replace mmu notifier
                                        v4
                                    
                                  
                                      No, that
                                        won't work. We would still run
                                        into lock inversion problems.
                                      
                                         
                                        What we
                                          could do with the scheduler is
                                          to turn submissions into
                                          dummies if we find that the
                                          page tables are now outdated.
                                      
                                      
                                        But that
                                          would be really hacky and I'm
                                          not sure if that would really
                                          work in all cases.
                                      
                                      
                                        Christian.
                                      
                                    
                                        Am
                                          27.09.2018 08:53 schrieb
                                          "Kuehling, Felix" <Felix.Kuehling@xxxxxxx>:
                                      
                                    
                                    I had a chat
                                      with Jerome yesterday. He pointed
                                      out that the new blockable
                                      parameter can be used to infer
                                      whether the MMU notifier is being
                                      called  in a reclaim operation. So
                                      if blockable==true, it should even
                                      be safe to take the BO reservation
                                      lock without problems. I think
                                      with that we should be able to
                                      remove the read-write locking
                                      completely and go back to locking
                                      (or try-locking for
                                      blockable==false) the reservation
                                      locks in the MMU notifier?

                                      
                                      Regards,

                                        Felix

                                      
                                      -----Original Message-----

                                      From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx>
                                      On Behalf Of Christian König

                                      Sent: Saturday, September 15, 2018
                                      3:47 AM

                                      To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>;
                                      Yang, Philip <Philip.Yang@xxxxxxx>;
                                      amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
                                      Jerome Glisse <j.glisse@xxxxxxxxx>

                                      Subject: Re: [PATCH] drm/amdgpu:
                                      use HMM mirror callback to replace
                                      mmu notifier v4

                                      
                                      Am 14.09.2018 um 22:21 schrieb
                                      Felix Kuehling:

                                      > On 2018-09-14 01:52 PM,
                                      Christian König wrote:

                                      >> Am 14.09.2018 um 19:47
                                      schrieb Philip Yang:

                                      >>> On 2018-09-14 03:51
                                      AM, Christian König wrote:

                                      >>>> Am 13.09.2018 um
                                      23:51 schrieb Felix Kuehling:

                                      >>>>> On 2018-09-13
                                      04:52 PM, Philip Yang wrote:

                                      >>>>> [SNIP]

                                      >>>>>>    +   
                                      amdgpu_mn_read_unlock(amn);

                                      >>>>>> +

                                      >>>>>
                                      amdgpu_mn_read_lock/unlock support
                                      recursive locking for multiple 

                                      >>>>> overlapping
                                      or nested invalidation ranges. But
                                      if you'r locking 

                                      >>>>> and unlocking
                                      in the same function. Is that
                                      still a concern?

                                      >>> I don't understand
                                      the possible recursive case, but

                                      >>> amdgpu_mn_read_lock()
                                      still support recursive locking.

                                      >>>> Well the real
                                      problem is that unlocking them
                                      here won't work.

                                      >>>>

                                      >>>> We need to hold
                                      the lock until we are sure that
                                      the operation which 

                                      >>>> updates the page
                                      tables is completed.

                                      >>>>

                                      >>> The reason for this
                                      change is because hmm mirror has 

                                      >>> invalidate_start
                                      callback, no invalidate_end
                                      callback

                                      >>>

                                      >>> Check mmu_notifier.c
                                      and hmm.c again, below is entire
                                      logic to 

                                      >>> update CPU page
                                      tables and callback:

                                      >>>

                                      >>> mn lock amn->lock
                                      is used to protect interval tree
                                      access because 

                                      >>> user may
                                      submit/register new userptr
                                      anytime.

                                      >>> This is same for old
                                      and new way.

                                      >>>

                                      >>> step 2 guarantee the
                                      GPU operation is done before
                                      updating CPU page 

                                      >>> table.

                                      >>>

                                      >>> So I think the change
                                      is safe. We don't need hold mn
                                      lock until the 

                                      >>> CPU page tables
                                      update is completed.

                                      >> No, that isn't even
                                      remotely correct. The lock doesn't
                                      protects the 

                                      >> interval tree.

                                      >>

                                      >>> Old:

                                      >>>     1.
                                      down_read_non_owner(&amn->lock)

                                      >>>     2. loop to handle
                                      BOs from node->bos through
                                      interval tree

                                      >>> amn->object nodes

                                      >>>         gfx: wait for
                                      pending BOs fence operation done,
                                      mark user 

                                      >>> pages dirty

                                      >>>         kfd: evict
                                      user queues of the process, wait
                                      for queue 

                                      >>> unmap/map operation
                                      done

                                      >>>     3. update CPU
                                      page tables

                                      >>>     4.
                                      up_read(&amn->lock)

                                      >>>

                                      >>> New, switch step 3
                                      and 4

                                      >>>     1.
                                      down_read_non_owner(&amn->lock)

                                      >>>     2. loop to handle
                                      BOs from node->bos through
                                      interval tree

                                      >>> amn->object nodes

                                      >>>         gfx: wait for
                                      pending BOs fence operation done,
                                      mark user 

                                      >>> pages dirty

                                      >>>         kfd: evict
                                      user queues of the process, wait
                                      for queue 

                                      >>> unmap/map operation
                                      done

                                      >>>     3.
                                      up_read(&amn->lock)

                                      >>>     4. update CPU
                                      page tables

                                      >> The lock is there to make
                                      sure that we serialize page table
                                      updates 

                                      >> with command submission.

                                      > As I understand it, the idea
                                      is to prevent command submission
                                      (adding 

                                      > new fences to BOs) while a
                                      page table invalidation is in
                                      progress.

                                      
                                      Yes, exactly.

                                      
                                      > But do we really need another
                                      lock for this? Wouldn't the 

                                      > re-validation of userptr BOs
                                      (currently calling get_user_pages)
                                      force 

                                      > synchronization with the
                                      ongoing page table invalidation
                                      through the 

                                      > mmap_sem or other MM locks?

                                      
                                      No and yes. We don't hold any
                                      other locks while doing command
                                      submission, but I expect that HMM
                                      has its own mechanism to prevent
                                      that.

                                      
                                      Since we don't modify
                                      amdgpu_mn_lock()/amdgpu_mn_unlock()
                                      we are certainly not using this
                                      mechanism correctly.

                                      
                                      Regards,

                                      Christian.

_______________________________________________

                                      amd-gfx mailing list

                                      amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                                      https://lists.freedesktop.org/mailman/listinfo/amd-gfx
                                  
                                
        _______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

      
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx