On Wed, Oct 09, 2019 at 03:10:09PM +0200, Christian König wrote: > Am 08.10.19 um 11:25 schrieb Daniel Vetter: > > On Thu, Aug 29, 2019 at 04:29:15PM +0200, Christian König wrote: > > > This way we can even pipeline imported BO evictions. > > > > > > v2: Limit this to only cases when the parent object uses a separate > > > reservation object as well. This fixes another OOM problem. > > > > > > Signed-off-by: Christian König <christian.koenig@xxxxxxx> > > Since I read quite a bit of ttm I figured I'll review this too, but I'm > > totally lost. And git blame gives me at best commits with one-liner commit > > messages, and the docs aren't explaining much at all either (and generally > > they didn't get updated at all with all the changes in the past years). > > > > I have a vague idea of what you're doing here, but not enough to do review > > with any confidence. And from other ttm patches from amd it feels a lot > > like we have essentially a bus factor of 1 for all things ttm :-/ > > Yeah, that's one of a couple of reasons why I want to get rid of TTM in the > long term. > > Basically this is a bug fix for delay freeing ttm objects. When we hang the > ttm object on a ghost object to be freed and the ttm object is an imported > DMA-buf we run into the problem that we want to drop the mapping, but have > the wrong lock taken (the lock of the ghost and not of the parent). Got intrigued, did some more digging, I guess the bugfix part is related to: commit 841e763b40764a7699ae07f4cb1921af62d6316d Author: Christian König <christian.koenig@xxxxxxx> Date: Thu Jul 20 20:55:06 2017 +0200 drm/ttm: individualize BO reservation obj when they are freed and that's why you switch everything over to useing _resv instead of the pointer. But then I still don't follow the details ... > > Regards, > Christian. > > > -Daniel > > > > > --- > > > drivers/gpu/drm/ttm/ttm_bo_util.c | 16 +++++++++------- > > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c > > > index fe81c565e7ef..2ebe9fe7f6c8 100644 > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > > > @@ -517,7 +517,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, > > > kref_init(&fbo->base.kref); > > > fbo->base.destroy = &ttm_transfered_destroy; > > > fbo->base.acc_size = 0; > > > - fbo->base.base.resv = &fbo->base.base._resv; > > > + if (bo->base.resv == &bo->base._resv) > > > + fbo->base.base.resv = &fbo->base.base._resv; I got confused a bit at first, until I spotted the fbo->base = *bo; somewhere above. So I think that part makes sense, together with the above cited patch. I think at least, confidence on this is very low ... > > > + > > > dma_resv_init(fbo->base.base.resv); > > > ret = dma_resv_trylock(fbo->base.base.resv); Shouldn't this be switched over to _resv too? Otherwise feels like unbalanced locking. > > > WARN_ON(!ret); > > > @@ -716,7 +718,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > > > if (ret) > > > return ret; > > > - dma_resv_add_excl_fence(ghost_obj->base.resv, fence); > > > + dma_resv_add_excl_fence(&ghost_obj->base._resv, fence); > > > /** > > > * If we're not moving to fixed memory, the TTM object > > > @@ -729,7 +731,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, > > > else > > > bo->ttm = NULL; > > > - ttm_bo_unreserve(ghost_obj); > > > + dma_resv_unlock(&ghost_obj->base._resv); > > > ttm_bo_put(ghost_obj); > > > } > > > @@ -772,7 +774,7 @@ int ttm_bo_pipeline_move(struct ttm_buffer_object *bo, > > > if (ret) > > > return ret; > > > - dma_resv_add_excl_fence(ghost_obj->base.resv, fence); > > > + dma_resv_add_excl_fence(&ghost_obj->base._resv, fence); > > > /** > > > * If we're not moving to fixed memory, the TTM object > > > @@ -785,7 +787,7 @@ int ttm_bo_pipeline_move(struct ttm_buffer_object *bo, > > > else > > > bo->ttm = NULL; > > > - ttm_bo_unreserve(ghost_obj); > > > + dma_resv_unlock(&ghost_obj->base._resv); I guess dropping the lru part here (aside from switching from ->resv to ->_resv, which is your bugfix I think) doesn't matter since the ghost object got all cleared up and isn't on any lists anyway? Otoh how does it work then ... Not clear to me why this is safe. > > > ttm_bo_put(ghost_obj); > > > } else if (from->flags & TTM_MEMTYPE_FLAG_FIXED) { > > > @@ -841,7 +843,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) > > > if (ret) > > > return ret; > > > - ret = dma_resv_copy_fences(ghost->base.resv, bo->base.resv); > > > + ret = dma_resv_copy_fences(&ghost->base._resv, bo->base.resv); > > > /* Last resort, wait for the BO to be idle when we are OOM */ > > > if (ret) > > > ttm_bo_wait(bo, false, false); > > > @@ -850,7 +852,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo) > > > bo->mem.mem_type = TTM_PL_SYSTEM; > > > bo->ttm = NULL; > > > - ttm_bo_unreserve(ghost); > > > + dma_resv_unlock(&ghost->base._resv); > > > ttm_bo_put(ghost); > > > return 0; > > > -- > > > 2.17.1 -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch