[PATCH 09/12] drm/amdgpu: recovery hw jobs when gpu reset

david1.zhou@xxxxxxx (zhoucm1) · Fri, 1 Jul 2016 17:50:43 +0800



On 2016å¹´07æ??01æ?¥ 17:30, Christian KÃ¶nig wrote:
> Am 30.06.2016 um 11:34 schrieb Chunming Zhou:
>> Change-Id: If10da1e224d81a12fd4f8d760c48178adb9e82d0
>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c     | 4 ++--
>>   2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a3ca83f..0759c23 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2002,8 +2002,9 @@ retry:
>>               struct amdgpu_ring *ring = adev->rings[i];
>>               if (!ring)
>>                   continue;
>> +            amd_sched_job_recovery(&ring->sched);
>>               kthread_unpark(ring->sched.thread);
>> -            amdgpu_ring_restore(ring, ring_sizes[i], ring_data[i]);
>> +            kfree(ring_data[i]);
>>               ring_sizes[i] = 0;
>>               ring_data[i] = NULL;
>>           }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index cced2f6..7393473 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -384,11 +384,11 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring,
>>           amdgpu_ring_emit_pipeline_sync(ring);
>>         if (ring->funcs->emit_vm_flush &&
>> -        pd_addr != AMDGPU_VM_NO_FLUSH) {
>> +        (pd_addr != AMDGPU_VM_NO_FLUSH || 
>> amdgpu_vm_is_gpu_reset(adev, id))) {
>>           struct fence *fence;
>>             trace_amdgpu_vm_flush(pd_addr, ring->idx, vm_id);
>> -        amdgpu_ring_emit_vm_flush(ring, vm_id, pd_addr);
>> +        amdgpu_ring_emit_vm_flush(ring, vm_id, id->pd_gpu_addr);
>
> NAK, we need to handle this differently. The problem is the 
> id->pd_gpu_addr could already be reseted when you have more than one 
> submission to the same engine.
>
> E.g. submission A1 uses VMID 1 and PD address A and submissing B1 uses 
> VMID1 as well but PD address B. When we do it like this we would use 
> PD address B for both submissions on restart.
Ah, I just realized my brach doesn't have your "save the PD..." patch, 
which already save the PD addr in job, we can directly use it.

>
> I suggest to just drop the AMDGPU_VM_NO_FLUSH special value and use a 
> boolean to signal that a flush is needed instead.
yes.

Thanks,
David Zhou
>
> Regards,
> Christian.
>
>>             r = amdgpu_fence_emit(ring, &fence);
>>           if (r)
>