Re: Re：[PATCH] drm/amdgpu: resvert "disable bulk moves for now"

"Pierre-Loup A. Griffais" <pgriffais@xxxxxxxxxxxxxxxxx> · Tue, 17 Sep 2019 14:56:44 -0700



    Hello,
    Applying this locally, the issue we were seeing with very high
      submit times in high-end workloads seems largely gone. My
      methodology is to measure the total time spent in
      DRM_IOCTL_AMDGPU_CS with `strace -T` for the whole first scene of
      the Shadow of the Tomb Raider benchmark, and divide by the frame
      count in that scene to get an idea of how much CPU time is spent
      in submissions per frame. More details below.

    
    On a Vega20 system with a 3900X, at High settings (~6 gigs of
      VRAM usage according to UMR, no contention):
     - 5.2.14: 1.1ms per frame in CS
     - 5.2.14 + LRU bulk moves: 0.6ms per frame in CS
    On a Polaris10 system with a i7-7820X, at Very High Settings
      (7.7G/8G VRAM used, no contention):
     - 5.2.15: 12.03ms per frame in CS (!)
     - 5.2.15 + LRU bulk moves:  1.35ms per frame in CS
    The issue is largely addressed. 1.35ms is still higher than I'd
      expect, but it's still pretty reasonable. Note that on many of our
      usecases, submission happens in a separate thread and doesn't
      typically impact overall frame time/latency if you have extra CPU
      cores to work with. However it very negatively affects performance
      as soon as the CPU gets saturated, and burns a ton of power.
    Thanks!
     - Pierre-Loup
    Methodology details:
    # Mesa patched to kill() itself with SIGCONT in vkQueuePresent to
      act as a frame marker in-band with the strace data.

    
    # strace collection:
    strace -f -p 13113 -e ioctl,kill -o sottr_first_scene_vanilla -T
    # frame count:

    
    cat sottr_first_scene_vanilla | grep kill\( | wc -l

      616
    # total time spent in _CS:
    cat sottr_first_scene_vanilla | grep AMDGPU_CS | grep -v
      unfinished | tr -s ' '  | cut -d ' ' -f7 | tr -d \< | tr -d
      \> | xargs  | tr ' ' '+' | bc

      7.411782
    # seconds to milliseconds, then divide by frame count
    (gdb) p 7.41 * 1000.0 / 616.0

      $1 = 12.029220779220779

    
    On 9/12/19 8:18 AM, Zhou,
      David(ChunMing) wrote:

    
      I dont know dkms status，anyway, we should submit this one as
        early as possible.

        
        -------- 原始邮件 --------

        主题：Re: [PATCH] drm/amdgpu: resvert "disable bulk moves for now"

        发件人：Christian König 

        收件人："Zhou, David(ChunMing)" ,amd-gfx@xxxxxxxxxxxxxxxxxxxxx

        抄送：

        
          Just to double check: We do have that
            enabled in the DKMS package for a
            

            while and doesn't encounter any more problems with it,
            correct?

            
            Thanks,

            Christian.

            
            Am 12.09.19 um 16:02 schrieb Chunming Zhou:

            > RB on it to go ahead.

            >

            > -David

            >

            > 在 2019/9/12 18:15, Christian König 写道:

            >> This reverts commit
            a213c2c7e235cfc0e0a161a558f7fdf2fb3a624a.

            >>

            >> The changes to fix this should have landed in 5.1.

            >>

            >> Signed-off-by: Christian König
            <christian.koenig@xxxxxxx>

            >> ---

            >>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 --

            >>    1 file changed, 2 deletions(-)

            >>

            >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
            b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

            >> index 48349e4f0701..fd3fbaa73fa3 100644

            >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

            >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

            >> @@ -603,14 +603,12 @@ void
            amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev,

            >>       struct ttm_bo_global *glob =
            adev->mman.bdev.glob;

            >>       struct amdgpu_vm_bo_base *bo_base;

            >>    

            >> -#if 0

            >>       if (vm->bulk_moveable) {

            >>               spin_lock(&glob->lru_lock);

            >>              
            ttm_bo_bulk_move_lru_tail(&vm->lru_bulk_move);

            >>               spin_unlock(&glob->lru_lock);

            >>               return;

            >>       }

            >> -#endif

            >>    

            >>       memset(&vm->lru_bulk_move, 0,
            sizeof(vm->lru_bulk_move));

            >>    

            
      _______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
    
  
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx