[PATCH 00/13] shadow page table support

david1.zhou@xxxxxxx (zhoucm1) · Tue, 26 Jul 2016 10:40:20 +0800

On 2016å¹´07æ??25æ?¥ 18:31, Christian KÃ¶nig wrote:
> First of all patches #10 and #11 look like bug fixes to existing code 
> to me. So we should fix those problems before working on anything else.
>
> Patch #10 is Reviewed-by: Christian KÃ¶nig <christian.koenig at amd.com>
>
> Patch #11:
>
>>      list_for_each_entry(s_job, &sched->ring_mirror_list, node) {
>>          struct amd_sched_fence *s_fence = s_job->s_fence;
>> -        struct fence *fence = sched->ops->run_job(s_job);
>> +        struct fence *fence;
>>
>> +        spin_unlock(&sched->job_list_lock);
>> +        fence = sched->ops->run_job(s_job);
>>          atomic_inc(&sched->hw_rq_count);
>>          if (fence) {
>>              s_fence->parent = fence_get(fence);
>> @@ -451,6 +453,7 @@ void amd_sched_job_recovery(struct 
>> amd_gpu_scheduler *sched)
>>              DRM_ERROR("Failed to run job!\n");
>>              amd_sched_process_job(NULL, &s_fence->cb);
>>          }
>> +        spin_lock(&sched->job_list_lock);
>>      }
>>      spin_unlock(&sched->job_list_lock);
> The problem is that the job might complete while we dropped the lock.
>
> Please use list_for_each_entry_safe here and add a comment why the 
> list could be modified in the meantime.
>
> With that fixed the patch is Reviewed-by: Christian KÃ¶nig 
> <christian.koenig at amd.com> as well.

OK, pushed above two.

>
> The remaining set looks very good to me as well, but I was rather 
> thinking of a more general approach instead of making it VM PD/PT 
> specific.
>
> For example we also need to backup/restore shaders when a hard GPU 
> reset happens.
>
> So I would suggest the following:
> 1. We add an optional "shadow" flag so that when a BO in VRAM is 
> allocated we also allocate a shadow BO in GART.
>
> 2. We have another "backup" flag that says on the next command 
> submission the BO is backed up from VRAM to GART before that submission.
>
> 3. We set the shadow flag for VM PD/PT BOs and every time we modify 
> them set the backup flag so they get backed up on next CS.
>
> 4. We add an IOCTL to allow setting the backup flag from userspace so 
> that we can trigger another backup even after the first CS.
>
> What do you think?

Sounds very good, will try.

Thanks,
David Zhou
>
> Regards,
> Christian.
>
> Am 25.07.2016 um 09:22 schrieb Chunming Zhou:
>> Since we cannot make sure VRAM is safe after gpu reset, page table 
>> backup
>> is neccessary, shadow page table is sense way to recovery page talbe 
>> when
>> gpu reset happens.
>> We need to allocate GTT bo as the shadow of VRAM bo when creating 
>> page table,
>> and make them same. After gpu reset, we will need to use SDMA to copy 
>> GTT bo
>> content to VRAM bo, then page table will be recoveried.
>>
>> Chunming Zhou (13):
>>    drm/amdgpu: add pd/pt bo shadow
>>    drm/amdgpu: update shadow pt bo while update pt
>>    drm/amdgpu: update pd shadow while updating pd
>>    drm/amdgpu: implement amdgpu_vm_recover_page_table_from_shadow
>>    drm/amdgpu: link all vm clients
>>    drm/amdgpu: add vm_list_lock
>>    drm/amd: add block entity function
>>    drm/amdgpu: recover page tables after gpu reset
>>    drm/amdgpu: add vm recover pt fence
>>    drm/amd: reset hw count when reset job
>>    drm/amd: fix deadlock of job_list_lock
>>    drm/amd: wait neccessary dependency before running job
>>    drm/amdgpu: fix sched deadoff
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  17 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  12 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |  30 ++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c       |   5 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |   5 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 161 
>> ++++++++++++++++++++++++--
>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c |  35 +++++-
>>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |   3 +
>>   8 files changed, 250 insertions(+), 18 deletions(-)
>>
>