Re: Re:[PATCH] drm/amdgpu: resvert "disable bulk moves for now"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Applying this locally, the issue we were seeing with very high submit times in high-end workloads seems largely gone. My methodology is to measure the total time spent in DRM_IOCTL_AMDGPU_CS with `strace -T` for the whole first scene of the Shadow of the Tomb Raider benchmark, and divide by the frame count in that scene to get an idea of how much CPU time is spent in submissions per frame. More details below.

On a Vega20 system with a 3900X, at High settings (~6 gigs of VRAM usage according to UMR, no contention):

 - 5.2.14: 1.1ms per frame in CS

 - 5.2.14 + LRU bulk moves: 0.6ms per frame in CS

On a Polaris10 system with a i7-7820X, at Very High Settings (7.7G/8G VRAM used, no contention):

 - 5.2.15: 12.03ms per frame in CS (!)

 - 5.2.15 + LRU bulk moves:  1.35ms per frame in CS

The issue is largely addressed. 1.35ms is still higher than I'd expect, but it's still pretty reasonable. Note that on many of our usecases, submission happens in a separate thread and doesn't typically impact overall frame time/latency if you have extra CPU cores to work with. However it very negatively affects performance as soon as the CPU gets saturated, and burns a ton of power.

Thanks!

 - Pierre-Loup

Methodology details:

# Mesa patched to kill() itself with SIGCONT in vkQueuePresent to act as a frame marker in-band with the strace data.

# strace collection:

strace -f -p 13113 -e ioctl,kill -o sottr_first_scene_vanilla -T

# frame count:

cat sottr_first_scene_vanilla | grep kill\( | wc -l
616

# total time spent in _CS:

cat sottr_first_scene_vanilla | grep AMDGPU_CS | grep -v unfinished | tr -s ' '  | cut -d ' ' -f7 | tr -d \< | tr -d \> | xargs  | tr ' ' '+' | bc
7.411782

# seconds to milliseconds, then divide by frame count

(gdb) p 7.41 * 1000.0 / 616.0
$1 = 12.029220779220779

On 9/12/19 8:18 AM, Zhou, David(ChunMing) wrote:
I dont know dkms status,anyway, we should submit this one as early as possible.

-------- 原始邮件 --------
主题:Re: [PATCH] drm/amdgpu: resvert "disable bulk moves for now"
发件人:Christian König
收件人:"Zhou, David(ChunMing)" ,amd-gfx@xxxxxxxxxxxxxxxxxxxxx
抄送:

Just to double check: We do have that enabled in the DKMS package for a
while and doesn't encounter any more problems with it, correct?

Thanks,
Christian.

Am 12.09.19 um 16:02 schrieb Chunming Zhou:
> RB on it to go ahead.
>
> -David
>
> 在 2019/9/12 18:15, Christian König 写道:
>> This reverts commit a213c2c7e235cfc0e0a161a558f7fdf2fb3a624a.
>>
>> The changes to fix this should have landed in 5.1.
>>
>> Signed-off-by: Christian König <christian.koenig@xxxxxxx>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 --
>>    1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 48349e4f0701..fd3fbaa73fa3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -603,14 +603,12 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device *adev,
>>       struct ttm_bo_global *glob = adev->mman.bdev.glob;
>>       struct amdgpu_vm_bo_base *bo_base;
>>   
>> -#if 0
>>       if (vm->bulk_moveable) {
>>               spin_lock(&glob->lru_lock);
>>               ttm_bo_bulk_move_lru_tail(&vm->lru_bulk_move);
>>               spin_unlock(&glob->lru_lock);
>>               return;
>>       }
>> -#endif
>>   
>>       memset(&vm->lru_bulk_move, 0, sizeof(vm->lru_bulk_move));
>>   


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux