Re: [RFC PATCH 3/3] drm/virtio: drm_gem_plane_helper_prepare_fb for obj synchronization

"Kim, Dongwon" <dongwon.kim@xxxxxxxxx> · Thu, 24 Aug 2023 10:58:26 -0700

On 8/23/2023 8:52 PM, Dmitry Osipenko wrote:
On 8/20/23 23:58, Kim, Dongwon wrote:
On 8/17/2023 7:33 PM, Dmitry Osipenko wrote:
On 7/13/23 01:44, Dongwon Kim wrote:
This helper is needed for framebuffer synchronization. Old
framebuffer data
is often displayed on the guest display without this helper.

Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx>
Cc: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
Signed-off-by: Dongwon Kim <dongwon.kim@xxxxxxxxx>
---
   drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++
   1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c
b/drivers/gpu/drm/virtio/virtgpu_plane.c
index a063f06ab6c5..e197299489ce 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -26,6 +26,7 @@
   #include <drm/drm_atomic_helper.h>
   #include <drm/drm_damage_helper.h>
   #include <drm/drm_fourcc.h>
+#include <drm/drm_gem_atomic_helper.h>
     #include "virtgpu_drv.h"
   @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct
drm_plane *plane,
       vgfb = to_virtio_gpu_framebuffer(new_state->fb);
       vgplane_st = to_virtio_gpu_plane_state(new_state);
       bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
+
+    drm_gem_plane_helper_prepare_fb(plane, new_state);
The implicit display BO sync should happen on a host side, unless you're
rendering with Venus and then displaying with virgl. Doing it on guest
side should be a major performance hit. Please provide a complete
description of your setup: what VMM you use, config options, what tests
you're running.
We use virtio-gpu as a kms device while using i915 as the render device
in our setup.
And we use QEMU as VMM. Virtio-gpu driver flushes the scanout to QEMU as
a blob resource
(reference to the buffer). QEMU then creates a dmabuf using udmabuf for
the blob
then renders it as a texture using OGL. The test I ran is simple. Just
starting terminal
app and typing things to see if there is any frame regression. I believe
this helper is
required since the BO on the guest is basically dmabuf that is being
shared between i915
and virtio-gpu driver. I didn't think about the performance impact. If
the impact is
too much and that is not acceptable, is there any other suggestions or
some tests I can try?
You can do fence-wait in the guest userspace/Mesa after blitting/drawing
to the udmabuf.

There is already synchronization between QEMU and virtio-gpu driver on the guest. Upon resource flush, virtio-gpu waits for the response for the message from the QEMU and QEMU sends out the response once rendering is done. The problem we are seeing is not that the rendering part is reusing the buffer before it's displayed by the QEMU. Problem we are facing is more like some frame is often not finished when "resource-flush" is issued. So unless there is a way for QEMU to wait for this fence (guest drm), I think we should have some synchronization point in the guest side.

I saw other DRM drivers, omap, tegra, vc4 and so on are doing the similar so I guess this is a generic solution for such cases. But I do understand your concern as the primary use case of virtio-gpu driver is for virgl. So this extra wait would cost some performance drop. But I have a couple of points here as well.

1. Wouldn't this extra wait caused by drm_gem_plane_helper_prepare_fb be minimal as the actual
rendering is done in the host?

2. Can we just make this helper called only if virgl is not used as 3D driver?


You may run popular vk/gl gfx benchmarks using gl/sdl outputs to see the
fps impact.

ok


Virglrender today supports native contexts. The method you're using for
GPU priming was proven to be slow in comparison to multi-gpu native
contexts. There is ongoing work for supporting fence passing from guest
to host [1] that allows to do fence-syncing on host. You'll find links
to the WIP virtio-intel native context in [1] as well. You won't find
GPU priming support using native context in [1], patches hasn't been
published yet.

[1]
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138

Note that in general it's not acceptable to upstream patches that serve
downstream only. Yours display sync issue is irrelevant to the upstream
stack unless you're going to upstream all the VMM and guest userspace
patches, and in such case you should always publish all the patches and
provide links.

So, you need to check the performance impact and publish all the patches
to the relevant upstream projects.

QEMU has all patches regarding this (blob scanout support) but guest Mesa
patch for KMSRO is still outstanding.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

I understand this specific patch would need more discussion/justification but
what about other two, are you generally ok with those in the same series?

Thanks!!