Re: [PATCH v2 3/4] drm/vc4: Remove BOs seqnos

Maíra Canal <mcanal@xxxxxxxxxx> · Mon, 16 Dec 2024 20:42:44 -0300

Hi Melissa,

Thanks for your review!

On 16/12/24 17:25, Melissa Wen wrote:
On 12/12, Maíra Canal wrote:
`bo->seqno`, `bo->write_seqno`, and `exec->bin_dep_seqno` are leftovers
from a time when VC4 didn't support DMA Reservation Objects. When they
were introduced, it made sense to think about tracking the correspondence
between the BOs and the jobs through the job's seqno.

This is no longer needed, as VC4 already has support for DMA Reservation
Objects and attaches the "job done" fence to the BOs.

Signed-off-by: Maíra Canal <mcanal@xxxxxxxxxx>
---
  drivers/gpu/drm/vc4/vc4_crtc.c     | 22 +++++++++++-----------
  drivers/gpu/drm/vc4/vc4_drv.h      | 13 -------------
  drivers/gpu/drm/vc4/vc4_gem.c      | 18 ++----------------
  drivers/gpu/drm/vc4/vc4_validate.c | 11 -----------
  4 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index cf40a53ad42e..3a5b5743cb2f 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -919,10 +919,11 @@ vc4_async_page_flip_complete(struct vc4_async_flip_state *flip_state)
  	kfree(flip_state);
  }
  
-static void vc4_async_page_flip_seqno_complete(struct vc4_seqno_cb *cb)
+static void vc4_async_page_flip_seqno_complete(struct dma_fence *fence,
+					       struct dma_fence_cb *cb)
  {
  	struct vc4_async_flip_state *flip_state =
-		container_of(cb, struct vc4_async_flip_state, cb.seqno);
+		container_of(cb, struct vc4_async_flip_state, cb.fence);

hmm... I didn't get why you still call this function
vc4_async_page_flip_seqno_complete if you are not using seqno anymore.

I'll rename it. Thanks!


On top of that, can we just use vc4_async_page_flip_fence_complete
instead? I mean, looks like we don't need two different async page flip
paths according to the hardware anymore.

`vc4_async_page_flip_seqno_complete()` updates the BO usecnt, which is
needed for VC4 (GEN == 4). Maxime, Dave, could you confirm that this
usecnt logic is still needed?


  	struct vc4_bo *bo = NULL;
  
  	if (flip_state->old_fb) {
@@ -961,16 +962,15 @@ static int vc4_async_set_fence_cb(struct drm_device *dev,
  {
  	struct drm_framebuffer *fb = flip_state->fb;
  	struct drm_gem_dma_object *dma_bo = drm_fb_dma_get_gem_obj(fb, 0);
+	dma_fence_func_t async_page_flip_complete_function;
  	struct vc4_dev *vc4 = to_vc4_dev(dev);
  	struct dma_fence *fence;
  	int ret;
  
-	if (vc4->gen == VC4_GEN_4) {
-		struct vc4_bo *bo = to_vc4_bo(&dma_bo->base);
-
-		return vc4_queue_seqno_cb(dev, &flip_state->cb.seqno, bo->seqno,
-					  vc4_async_page_flip_seqno_complete);
-	}
+	if (vc4->gen == VC4_GEN_4)
+		async_page_flip_complete_function = vc4_async_page_flip_seqno_complete;
+	else
+		async_page_flip_complete_function = vc4_async_page_flip_fence_complete;
  
  	ret = dma_resv_get_singleton(dma_bo->base.resv, DMA_RESV_USAGE_READ, &fence);
  	if (ret)
@@ -978,14 +978,14 @@ static int vc4_async_set_fence_cb(struct drm_device *dev,
  
  	/* If there's no fence, complete the page flip immediately */
  	if (!fence) {
-		vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
+		async_page_flip_complete_function(fence, &flip_state->cb.fence);
  		return 0;
  	}
  
  	/* If the fence has already been completed, complete the page flip */
  	if (dma_fence_add_callback(fence, &flip_state->cb.fence,
-				   vc4_async_page_flip_fence_complete))
-		vc4_async_page_flip_fence_complete(fence, &flip_state->cb.fence);
+				   async_page_flip_complete_function))
+		async_page_flip_complete_function(fence, &flip_state->cb.fence);
  
  	return 0;
  }

[...]

diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 4037c65eb269..1cbd95c4f9ef 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c

[...]

@@ -845,12 +837,6 @@ vc4_get_bcl(struct drm_device *dev, struct vc4_exec_info *exec)
  			goto fail;
  	}
  
-	/* Block waiting on any previous rendering into the CS's VBO,
-	 * IB, or textures, so that pixels are actually written by the
-	 * time we try to read them.
-	 */
-	ret = vc4_wait_for_seqno(dev, exec->bin_dep_seqno, ~0ull, true);

Don't we still need waiting for fences here?


As we indicate the implicit DMA reservation usage through the `enum
dma_resv_usage` flag, I believe we can be safe to assume that the BOs
won't be read before we finish writing.

Best Regards,
- Maíra

--
2.47.1