Re: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

Philip Yang <yangp@xxxxxxx> · Thu, 13 Feb 2025 10:45:29 -0500



    On 2025-02-12 23:33, Deng, Emily wrote:

    
        [AMD Official Use Only - AMD Internal Distribution Only]

      
                From:
                    Yang, Philip <Philip.Yang@xxxxxxx>
                    

                    Sent: Wednesday, February 12, 2025 10:31 PM

                    To: Deng, Emily <Emily.Deng@xxxxxxx>;
                    Yang, Philip <Philip.Yang@xxxxxxx>; Chen,
                    Xiaogang <Xiaogang.Chen@xxxxxxx>;
                    amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                    Subject: Re: [PATCH] drm/amdkfd: Fix the
                    deadlock in svm_range_restore_work
              
            
              On 2025-02-12 03:54, Deng, Emily
                wrote:
            
            
              [AMD
                  Official Use Only - AMD Internal Distribution Only]
               
              
                Ping……
                 
                Emily
                    Deng
                Best
                    Wishes
                
                  
                      From:
                          Deng, Emily
                          <Emily.Deng@xxxxxxx>
                          

                          Sent: Tuesday, February 11, 2025 8:21
                          PM

                          To: Deng, Emily <Emily.Deng@xxxxxxx>;
                          Yang, Philip
                          <Philip.Yang@xxxxxxx>;
                          Chen, Xiaogang 
                            <Xiaogang.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                          Subject: RE: [PATCH] drm/amdkfd: Fix
                          the deadlock in svm_range_restore_work
                    
                  
                  [AMD
                      Official Use Only - AMD Internal Distribution
                      Only]
                   
                  
                    Hi
                        Philip,
                        
                        Upon further consideration,
                        removing amdgpu_amdkfd_unreserve_mem_limit is
                        challenging because it is paired
                        with amdgpu_amdkfd_reserve_mem_limit in svm_migrate_ram_to_vram.
                        However, this pairing does introduce issues, as
                        it prevents amdgpu_amdkfd_reserve_mem_limit from
                        accurately detecting out-of-memory conditions.
                        Ideally, amdgpu_amdkfd_unreserve_mem_limit should
                        be tied to the actual freeing of memory.
                        Furthermore, since ttm_bo_delayed_delete delays
                        the call to amdgpu_vram_mgr_del, there remains a
                        possibility
                        that amdgpu_amdkfd_reserve_mem_limit reports
                        sufficient memory, while a subsequent call
                        to amdgpu_vram_mgr_new fails. For these reasons,
                        I believe this patch is still necessary.
                     
                    Emily
                        Deng
                    Best
                        Wishes
                    
                      
                          From:
                              amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx>
                              On Behalf Of Deng, Emily

                              Sent: Tuesday, February 11, 2025
                              6:56 PM

                              To: Yang, Philip <Philip.Yang@xxxxxxx>;
                              Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>;
                              amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                              Subject: RE: [PATCH] drm/amdkfd:
                              Fix the deadlock in svm_range_restore_work
                        
                      
                      [AMD
                          Official Use Only - AMD Internal Distribution
                          Only]
                       
                      
                        [AMD
                            Official Use Only - AMD Internal
                            Distribution Only]
                         
                        
                                From:
                                    Yang, Philip <Philip.Yang@xxxxxxx>
                                    

                                    Sent: Tuesday, February 11,
                                    2025 6:54 AM

                                    To: Deng, Emily <Emily.Deng@xxxxxxx>;
                                    Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>;
                                    amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                                    Subject: Re: [PATCH]
                                    drm/amdkfd: Fix the deadlock in
                                    svm_range_restore_work
                              
                            
                              On 2025-02-10 02:51,
                                Deng, Emily wrote:
                            
                            
                              [AMD
                                  Official Use Only - AMD Internal
                                  Distribution Only]
                               
                              
                                [AMD
                                    Official Use Only - AMD Internal
                                    Distribution Only]
                                 
                                
                                        From:
                                            Chen, Xiaogang
                                            <Xiaogang.Chen@xxxxxxx>
                                            

                                            Sent: Monday,
                                            February 10, 2025 10:18 AM

                                            To: Deng, Emily <Emily.Deng@xxxxxxx>;
                                            amd-gfx@xxxxxxxxxxxxxxxxxxxxx

                                            Subject: Re: [PATCH]
                                            drm/amdkfd: Fix the deadlock
                                            in svm_range_restore_work
                                      
                                    
                                      On 2/7/2025
                                        9:02 PM, Deng, Emily wrote:
                                    
                                    
                                      [AMD Official Use Only - AMD Internal Distribution Only]
                                       
                                      [AMD Official Use Only - AMD Internal Distribution Only]
                                       
                                      Ping.......
                                       
                                      Emily Deng
                                      Best Wishes
                                       
                                       
                                        -----Original Message-----
                                        From: Emily Deng <Emily.Deng@xxxxxxx>
                                        Sent: Friday, February 7, 2025 6:28 PM
                                        To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
                                        Cc: Deng, Emily <Emily.Deng@xxxxxxx>
                                        Subject: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work
                                         
                                        It will hit deadlock in svm_range_restore_work ramdonly.
                                        Detail as below:
                                        1.svm_range_restore_work
                                              ->svm_range_list_lock_and_flush_work
                                              ->mmap_write_lock
                                        2.svm_range_restore_work
                                              ->svm_range_validate_and_map
                                              ->amdgpu_vm_update_range
                                              ->amdgpu_vm_ptes_update
                                              ->amdgpu_vm_pt_alloc
                                              ->svm_range_evict_svm_bo_worker
                                      
                                    
                                    svm_range_evict_svm_bo_worker is
                                      a function running by a kernel
                                      task from default system_wq. It is
                                      not the task that runs
                                      svm_range_restore_work which is
                                      from system_freezable_wq. The
                                      second task may need wait the
                                      first task to release
                                      mmap_write_lock, but there is no
                                      cycle lock dependency.
                                    Can you explain more how deadlock
                                      happened? If a deadlock exists
                                      between two tasks there are should
                                      be at least two locks used by both
                                      tasks.
                                    Regards
                                    Xiaogang
                                      
                                    In Step
                                      2, during
                                      the amdgpu_vm_pt_alloc process,
                                      the system encounters insufficient
                                      memory and triggers an eviction.
                                      This initiates
                                      the svm_range_evict_svm_bo_worker task,
                                      and waits for
                                      the eviction_fence to be signaled.
                                      However,
                                      the svm_range_evict_svm_bo_worker cannot
                                      acquire the mmap_read_lock(mm),
                                      preventing it from signaling
                                      the eviction_fence. As a
                                      result, amdgpu_vm_pt_alloc remains
                                      incomplete and cannot release
                                      the mmap_write_lock(mm).
                                      
                                    Which
                                      means
                                      the svm_range_restore_work task
                                      holds the mmap_write_lock(mm) and
                                      is stuck waiting for
                                      the eviction_fence to be signaled
                                      by svm_range_evict_svm_bo_worker.
However, svm_range_evict_svm_bo_worker is itself blocked, unable to
                                      acquire the mmap_read_lock(mm).
                                      This creates a deadlock.
                                  
                                
                            The deadlock situation should not happen
                              as svm_range_restore_work is only used for
                              xnack off case, there is no VRAM over
                              commitment with KFD
                              amdgpu_amdkfd_reserve_mem_limit. We
                              reserved VRAM ESTIMATE_PT_SIZE for page
                              table allocation to prevent this
                              situation.
                            Regards,
                            Philip
                            Hi
                                Philip,
                                
                                You're correct. Upon further
                                investigation, the issue arises from the
                                additional call
                                to amdgpu_amdkfd_unreserve_mem_limit in svm_migrate_ram_to_vram,
                                which
                                prevents amdgpu_amdkfd_reserve_mem_limit from
                                detecting the out-of-memory condition. I
                                will submit another patch to remove
                                the amdgpu_amdkfd_unreserve_mem_limit call
                                in svm_migrate_ram_to_vram.
                             
                          
            We check all SVM memory must fit in system memory, don't
              account svm VRAM usage. For xnack off, application should
              check available VRAM size and avoid VRAM over commitment.
            svm_range_restore_worker ensure all SVM ranges are mapped
              to GPUs then resume queues, this is done by taking mmap
              write lock and flush deferred_range_list. downgrade to
              mmap read lock cannot prevent unmap from CPU as mmu
              notifier callback can add range to deferred_range_list
              again and unmap from GPUs, so this patch can not work.
            Maybe
                I understand wrong. but downgrading to a read lock could
                also prevent svm_range_deferred_list_work from acquiring
                a write lock. As a result, it could potentially block
                unmapping operations from GPUs.
          
        
    no, svm_range_cpu_invalidate_pagetables takes prange lock to
      split prange, and add to deferred_list if needed, then unmap from
      GPU and return.
    This needs app fix, not over commitment, prefetch svm ranges to
      VRAM if xnack is off.

    
    Regards,
    Philip

    
            Emily
                Deng
            Best
                Wishes
          
           
          We should not use mmap write lock to sync with mmu
            notifier, there is a plan to rework svm locks to fix this.
          Regards,
          Philip
          
            
                          Emily
                              Deng
                          Best
                              Wishes
                        
                         
                                Emily
                                    Deng
                                Best
                                    Wishes
                              
                               
                                        ->mmap_read_lock(deadlock here, because already get mmap_write_lock)
                                   
                                  How to fix?
                                  Downgrade the write lock to read lock.
                                   
                                  Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
                                  ---
                                  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 ++-
                                  1 file changed, 2 insertions(+), 1 deletion(-)
                                   
                                  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
                                  b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
                                  index bd3e20d981e0..c907e2de3dde 100644
                                  --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
                                  +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
                                  @@ -1841,6 +1841,7 @@ static void svm_range_restore_work(struct work_struct
                                  *work)
                                        mutex_lock(&process_info->lock);
                                        svm_range_list_lock_and_flush_work(svms, mm);
                                        mutex_lock(&svms->lock);
                                  +      mmap_write_downgrade(mm);
                                   
                                        evicted_ranges = atomic_read(&svms->evicted_ranges);
                                   
                                  @@ -1890,7 +1891,7 @@ static void svm_range_restore_work(struct work_struct
                                  *work)
                                   
                                  out_reschedule:
                                        mutex_unlock(&svms->lock);
                                  -      mmap_write_unlock(mm);
                                  +      mmap_read_unlock(mm);
                                        mutex_unlock(&process_info->lock);
                                   
                                        /* If validation failed, reschedule another attempt */
                                  --
                                  2.34.1