Patch #1-#4 are Reviewed-by: Christian König <christian.koenig at amd.com>, please commit those so that we can consider them as done. Patch #5: > > +int amdgpu_bo_sync_between_bo_and_shadow(struct amdgpu_device *adev, > + struct amdgpu_ring *ring, > + struct amdgpu_bo *bo, > + struct reservation_object *resv, > + struct fence **fence) Maybe rename this to amdgpu_bo_backup_shadow(), when talking about sync at least I always think about synchronization using fences. > + > + if (bo->backup_shadow & AMDGPU_BO_SHADOW_TO_NONE) { That condition will never be true, cause AMDGPU_BO_SHADOW_TO_NONE is defined as 0. You probably wanted to use "==" here, don't you? > + r = amdgpu_bo_pin(bo, bo->prefered_domains, &bo_addr); > + if (r) { > + DRM_ERROR("Failed to pin bo object\n"); > + goto err1; > + } > + r = amdgpu_bo_pin(bo->shadow, bo->shadow->prefered_domains, &shadow_addr); > + if (r) { > + DRM_ERROR("Failed to pin bo shadow object\n"); > + goto err2; > + } Again don't use amdgpu_bo_pin() here. Patch #6: > + r = amdgpu_bo_reserve(vm->page_directory, false); > + if (r) > + return r; You need to double check if the BOs aren't swapped out. See amdgpu_gem_va_update_vm() on how to do this. > + fence_put(vm->shadow_sync_fence); > + vm->shadow_sync_fence = fence_get(fence); I wouldn't remember the backup operation at all, just return the fence to the caller instead. > + r = amdgpu_bo_sync_between_bo_and_shadow(adev, ring, bo, > + NULL, &fence); As discussed we don't want to use the scheduler for the recovery operation, so we need a flag to send the commands directly to the ring here. Patch #7 and following. Thinking about this more the whole idea of using a separate entity is a NAK. The basic problem is that you can't know which updates are completed, crashed or in flight when the the GPU recovery hits us. So the shadow needs to be updated before the real page table and none of this can be done out of order. I suggest you just switch back to doing the updates twice for now. Regards, Christian. Am 12.08.2016 um 08:38 schrieb Chunming Zhou: > Since we cannot ensure VRAM is consistent after a GPU reset, page > table shadowing is necessary. Shadowed page tables are, in a sense, a > method to recover the consistent state of the page tables before the > reset occurred. > > We need to allocate GTT bo as the shadow of VRAM bo when creating page table, > and make them the same. After gpu reset, we will need to use SDMA to copy GTT bo > content to VRAM bo, then page table will be recoveried. > > > V2: > Shadow bo uses a shadow entity running on normal run queue, after gpu reset, > we need to wait for all shadow jobs finished first, then recovery page table from shadow. > > V3: > Addressed Christian comments for shadow bo part. > > Chunming Zhou (18): > drm/amdgpu: add shadow bo support V2 > drm/amdgpu: validate shadow as well when validating bo > drm/amdgpu: allocate shadow for pd/pt bo V2 > drm/amdgpu: add shadow flag V2 > drm/amdgpu: sync bo and shadow > drm/amdgpu: implement vm recovery function from shadow > drm/amdgpu: add shadow_entity for shadow page table updates > drm/amdgpu: update pd shadow bo > drm/amdgpu: update pt shadow > drm/amd: add last fence in sched entity > drm/amdgpu: link all vm clients > drm/amdgpu: add vm_list_lock > drm/amd: add block entity function > drm/amdgpu: add shadow fence owner > drm/amd: block entity > drm/amdgpu: recover page tables after gpu reset > drm/amdgpu: add need backup function > drm/amdgpu: add backup condition for vm > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 25 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 82 +++++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 88 +++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 100 ++++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 + > drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 3 + > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 281 +++++++++++++++++++------- > drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 38 +++- > drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 5 + > include/uapi/drm/amdgpu_drm.h | 2 + > 10 files changed, 524 insertions(+), 105 deletions(-) >