On 2016å¹´07æ??25æ?¥ 18:31, Christian König wrote: > First of all patches #10 and #11 look like bug fixes to existing code > to me. So we should fix those problems before working on anything else. > > Patch #10 is Reviewed-by: Christian König <christian.koenig at amd.com> > > Patch #11: > >> list_for_each_entry(s_job, &sched->ring_mirror_list, node) { >> struct amd_sched_fence *s_fence = s_job->s_fence; >> - struct fence *fence = sched->ops->run_job(s_job); >> + struct fence *fence; >> >> + spin_unlock(&sched->job_list_lock); >> + fence = sched->ops->run_job(s_job); >> atomic_inc(&sched->hw_rq_count); >> if (fence) { >> s_fence->parent = fence_get(fence); >> @@ -451,6 +453,7 @@ void amd_sched_job_recovery(struct >> amd_gpu_scheduler *sched) >> DRM_ERROR("Failed to run job!\n"); >> amd_sched_process_job(NULL, &s_fence->cb); >> } >> + spin_lock(&sched->job_list_lock); >> } >> spin_unlock(&sched->job_list_lock); > The problem is that the job might complete while we dropped the lock. > > Please use list_for_each_entry_safe here and add a comment why the > list could be modified in the meantime. > > With that fixed the patch is Reviewed-by: Christian König > <christian.koenig at amd.com> as well. OK, pushed above two. > > The remaining set looks very good to me as well, but I was rather > thinking of a more general approach instead of making it VM PD/PT > specific. > > For example we also need to backup/restore shaders when a hard GPU > reset happens. > > So I would suggest the following: > 1. We add an optional "shadow" flag so that when a BO in VRAM is > allocated we also allocate a shadow BO in GART. > > 2. We have another "backup" flag that says on the next command > submission the BO is backed up from VRAM to GART before that submission. > > 3. We set the shadow flag for VM PD/PT BOs and every time we modify > them set the backup flag so they get backed up on next CS. > > 4. We add an IOCTL to allow setting the backup flag from userspace so > that we can trigger another backup even after the first CS. > > What do you think? Sounds very good, will try. Thanks, David Zhou > > Regards, > Christian. > > Am 25.07.2016 um 09:22 schrieb Chunming Zhou: >> Since we cannot make sure VRAM is safe after gpu reset, page table >> backup >> is neccessary, shadow page table is sense way to recovery page talbe >> when >> gpu reset happens. >> We need to allocate GTT bo as the shadow of VRAM bo when creating >> page table, >> and make them same. After gpu reset, we will need to use SDMA to copy >> GTT bo >> content to VRAM bo, then page table will be recoveried. >> >> Chunming Zhou (13): >> drm/amdgpu: add pd/pt bo shadow >> drm/amdgpu: update shadow pt bo while update pt >> drm/amdgpu: update pd shadow while updating pd >> drm/amdgpu: implement amdgpu_vm_recover_page_table_from_shadow >> drm/amdgpu: link all vm clients >> drm/amdgpu: add vm_list_lock >> drm/amd: add block entity function >> drm/amdgpu: recover page tables after gpu reset >> drm/amdgpu: add vm recover pt fence >> drm/amd: reset hw count when reset job >> drm/amd: fix deadlock of job_list_lock >> drm/amd: wait neccessary dependency before running job >> drm/amdgpu: fix sched deadoff >> >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 17 ++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 5 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 161 >> ++++++++++++++++++++++++-- >> drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 35 +++++- >> drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 3 + >> 8 files changed, 250 insertions(+), 18 deletions(-) >> >