[PATCH 0/8] shadow page table support V5 ---> shadow bo support

deathsimple@xxxxxxxxxxx (Christian König) · Wed, 17 Aug 2016 16:16:32 +0200

Errors should actually be reported by the caller, not here.

So we should probably remove that DRM_ERROR here as well.

Christian.

Am 17.08.2016 um 16:10 schrieb StDenis, Tom:
>
> Why not be consistent and add a DRM_ERROR on all paths that include an 
> error?  e.g. instead of
>
>
> if (r)
>
>    goto error_free;
>
>
> Throw a DRM_ERROR("") in there.
>
>
> Tom
>
>
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of 
> Christian KÃ¶nig <deathsimple at vodafone.de>
> *Sent:* Wednesday, August 17, 2016 10:04
> *To:* Zhou, David(ChunMing); amd-gfx at lists.freedesktop.org
> *Subject:* Re: [PATCH 0/8] shadow page table support V5 ---> shadow bo 
> support
> Patch #1:
>
> Could be that we need to add another module parameter to control this,
> but I think for now that should be sufficient.
>
> Patch is Reviewed-by: Christian KÃ¶nig <christian.koenig at amd.com>
>
> Patch #2:
>
> > +     if (direct_submit) {
> > +             r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs,
> > +                                    NULL, NULL, fence);
> > +             job->fence = fence_get(*fence);
> > +             if (r)
> > +                     DRM_ERROR("Error scheduling IBs (%d)\n", r);
> > +             amdgpu_job_free(job);
>
> When there is an error you should return the code as well.
>
> > +     } else {
> > +             r = amdgpu_job_submit(job, ring, &adev->mman.entity,
> > + AMDGPU_FENCE_OWNER_UNDEFINED, fence);
> > +             if (r)
> > +                     goto error_free;
> > +     }
> >
> >        return 0;
>
> Just changing this to to "return r;" should be sufficient.
>
> With that fixed the patch is Reviewed-by: Christian KÃ¶nig
> <christian.koenig at amd.com> as well.
>
> Patch #3:
>
> You mentioned that this isn't used during command submission but rather
> during GPU reset, correct?
>
> In this case I would advise not to use a member in the BO structure to
> note the backup direction and instead have one function for back and one
> for restoring the content (or note that as a parameter to the function).
>
> Otherwise we could run into trouble when the CS wants to backup
> something the GPU reset wants to restore at the same time.
>
> Patch #4: Was already reviewed by me.
>
> Patch #5:
>
> > +     pt = params->shadow ? vm->page_tables[pt_idx].entry.robj->shadow :
> > + vm->page_tables[pt_idx].entry.robj;
> You need to double check here if shadows are actually allocated or not.
> Otherwise we will crash on an APU.
>
> > +     /* double ndw, since need to update shadow pt bo as well */
> > +     ndw *= 2;
> > +
> We don't need to double the IB size, but only the number of commands in
> it (ncmds).
>
> The difference is when we want to write scattered GART entries. In this
> case I've added the PTEs to the end of the IB.
>
> Patch #6:
> > +     spinlock_t shadow_list_lock;
> We might want to use a mutex here instead. Usually I would agree that a
> spin lock is better for a linked list, but during the restore process in
> a GPU reset we probably want to sleep here.
>
> Alternatively you could splice the list to a local version on the stack,
> grab references to the BOs and then drop the lock during the restore
> process.
>
> > @@ -541,6 +546,13 @@ void amdgpu_bo_unref(struct amdgpu_bo **bo)
> >        if ((*bo) == NULL)
> >                return;
> >
> > +     /* shadow must be freed before bo itself */
> > +     if ((!(*bo)->shadow) && !list_empty(&(*bo)->shadow_list)) {
> > + spin_lock(&(*bo)->adev->shadow_list_lock);
> > + list_del_init(&(*bo)->shadow_list);
> > + spin_unlock(&(*bo)->adev->shadow_list_lock);
> > +
> > +     }
> >        tbo = &((*bo)->tbo);
> >        ttm_bo_unref(&tbo);
> >        if (tbo == NULL)
> That would trigger even when we take a temporary reference. I suggest to
> add a amdgpu_bo_unref_shadow() function instead, unreferencing both the
> shadow and the normal BO.
>
> This can then be used in the VM code to cleanup the shadow created.
>
> Going to skip patch #7 and #8 for now cause our team call begins in a
> moment.
>
> Regards,
> Christian.
>
> Am 17.08.2016 um 08:05 schrieb Chunming Zhou:
> > Since we cannot ensure VRAM is consistent after a GPU reset, page
> > table shadowing is necessary. Shadowed page tables are, in a sense, a
> > method to recover the consistent state of the page tables before the
> > reset occurred.
> >
> > We need to allocate GTT bo as the shadow of VRAM bo when creating 
> page table,
> > and make them the same. After gpu reset, we will need to use SDMA to 
> copy GTT bo
> > content to VRAM bo, then page table will be recoveried.
> >
> >
> > V2:
> > Shadow bo uses a shadow entity running on normal run queue, after 
> gpu reset,
> > we need to wait for all shadow jobs finished first, then recovery 
> page table from shadow.
> >
> > V3:
> > Addressed Christian comments for shadow bo part.
> >
> > V4:
> > Switch back to update page table twice (one of two is for shadow)
> >
> > V5:
> > make it be gerneic shadow bo support. Address Christian comments.
> >
> > Chunming Zhou (8):
> >    drm/amdgpu: add need backup function V2
> >    drm/amdgpu: add direct submision option for copy_buffer
> >    drm/amdgpu: sync bo and shadow V2
> >    drm/amdgpu: update pd shadow while updating pd
> >    drm/amdgpu: update pt shadow while updating pt V2
> >    drm/amdgpu: link all shadow bo
> >    drm/amdgpu: implement recovery vram bo from shadow
> >    drm/amdgpu: recover vram bo from shadow after gpu reset
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 9 ++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 39 ++++++++++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    | 94 
> ++++++++++++++++++++++++++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    | 9 +++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_test.c      | 4 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 21 ++++--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 69 
> ++++++++++++++------
> >   8 files changed, 215 insertions(+), 33 deletions(-)
> >
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> amd-gfx Info Page - lists.freedesktop.org 
> <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
> lists.freedesktop.org
> To see the collection of prior postings to the list, visit the amd-gfx 
> Archives. Using amd-gfx: To post a message to all the list members, 
> send email ...
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160817/00e6675b/attachment.html>