On Wed, 2022-04-20 at 20:56 +0200, Christian König wrote: > ⚠ External Email > > Am 20.04.22 um 20:49 schrieb Christian König: > > Am 20.04.22 um 20:41 schrieb Zack Rusin: > > > On Wed, 2022-04-20 at 19:40 +0200, Christian König wrote: > > > > Am 20.04.22 um 19:38 schrieb Zack Rusin: > > > > > On Wed, 2022-04-20 at 09:37 +0200, Christian König wrote: > > > > > > ⚠ External Email > > > > > > > > > > > > Hi Zack, > > > > > > > > > > > > Am 20.04.22 um 05:56 schrieb Zack Rusin: > > > > > > > On Thu, 2022-04-07 at 10:59 +0200, Christian König wrote: > > > > > > > > Rework the internals of the dma_resv object to allow > > > > > > > > adding > > > > > > > > more > > > > > > > > than > > > > > > > > one > > > > > > > > write fence and remember for each fence what purpose it > > > > > > > > had. > > > > > > > > > > > > > > > > This allows removing the workaround from amdgpu which > > > > > > > > used a > > > > > > > > container > > > > > > > > for > > > > > > > > this instead. > > > > > > > > > > > > > > > > Signed-off-by: Christian König > > > > > > > > <christian.koenig@xxxxxxx> > > > > > > > > Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > > > > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > > > > afaict this change broke vmwgfx which now kernel oops > > > > > > > right > > > > > > > after > > > > > > > boot. > > > > > > > I haven't had the time to look into it yet, so I'm not > > > > > > > sure > > > > > > > what's > > > > > > > the > > > > > > > problem. I'll look at this tomorrow, but just in case you > > > > > > > have > > > > > > > some > > > > > > > clues, the backtrace follows: > > > > > > that's a known issue and should already be fixed with: > > > > > > > > > > > > commit d72dcbe9fce505228dae43bef9da8f2b707d1b3d > > > > > > Author: Christian König <christian.koenig@xxxxxxx> > > > > > > Date: Mon Apr 11 15:21:59 2022 +0200 > > > > > Unfortunately that doesn't seem to be it. The backtrace is > > > > > from the > > > > > current (as of the time of sending of this email) drm-misc- > > > > > next, > > > > > which > > > > > has this change, so it's something else. > > > > Ok, that's strange. In this case I need to investigate further. > > > > > > > > Maybe VMWGFX is adding more than one fence and we actually need > > > > to > > > > reserve multiple slots. > > > This might be helper code issue with CONFIG_DEBUG_MUTEXES set. On > > > that config > > > dma_resv_reset_max_fences does: > > > fences->max_fences = fences->num_fences; > > > For some objects num_fences is 0 and so after max_fences and > > > num_fences are both 0. > > > And then BUG_ON(num_fences >= max_fences) is triggered. > > > > Yeah, but that's expected behavior. > > > > What's not expected is that max_fences is still 0 (or equal to old > > num_fences) when VMWGFX tries to add a new fence. The function > > ttm_eu_reserve_buffers() should have reserved at least one fence > > slot. > > > > So the underlying problem is that either ttm_eu_reserve_buffers() > > was > > never called or VMWGFX tried to add more than one fence. > > > To figure out what it is could you try the following code fragment: > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c > index f46891012be3..a36f89d3f36d 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c > @@ -288,7 +288,7 @@ int vmw_validation_add_bo(struct > vmw_validation_context *ctx, > val_buf->bo = ttm_bo_get_unless_zero(&vbo->base); > if (!val_buf->bo) > return -ESRCH; > - val_buf->num_shared = 0; > + val_buf->num_shared = 16; > list_add_tail(&val_buf->head, &ctx->bo_list); > bo_node->as_mob = as_mob; > bo_node->cpu_blit = cpu_blit; Fails the same BUG_ON with num_fences and max_fences == 0. z