Re: [RFC PATCH 3/3] drm/virtio: drm_gem_plane_helper_prepare_fb for obj synchronization

"Kim, Dongwon" <dongwon.kim@xxxxxxxxx> · Tue, 5 Sep 2023 14:08:46 -0700

Hi Dmitry,

On 8/31/2023 3:51 PM, Dmitry Osipenko wrote:
On 8/24/23 20:58, Kim, Dongwon wrote:
...
You can do fence-wait in the guest userspace/Mesa after blitting/drawing
to the udmabuf.
There is already synchronization between QEMU and virtio-gpu driver on
the guest. Upon resource flush, virtio-gpu waits for the response for
the message from the QEMU and QEMU sends out the response once rendering
is done. The problem we are seeing is not that the rendering part is
reusing the buffer before it's displayed by the QEMU. Problem we are
facing is more like some frame is often not finished when
"resource-flush" is issued. So unless there is a way for QEMU to wait
for this fence (guest drm), I think we should have some synchronization
point in the guest side.

I saw other DRM drivers, omap, tegra, vc4 and so on are doing the
similar so I guess this is a generic solution for such cases. But I do
understand your concern as the primary use case of virtio-gpu driver is
for virgl. So this extra wait would cost some performance drop. But I
have a couple of points here as well.

1. Wouldn't this extra wait caused by drm_gem_plane_helper_prepare_fb be
minimal as the actual
rendering is done in the host?

2. Can we just make this helper called only if virgl is not used as 3D
driver?
The problem you described above shouldn't be resolved by your patch. You
need to wait for FB to be released by the host's display and not to
before GPU finished rendering on guest. I.e. you're swapping display
buffers and your dGPU starts rendering to the buffer that is in active
use by host's display, correct?

I don't believe the guest will start rendering on the same FB while host is
consuming it because the virtio-gpu driver on the guest won't release the FB for the next
frame before it gets the virtio resp for the resource flush command and the host (QEMU)
will hold the response until the rendering is finished.

And having this helper clearly fixes the issue we encountered (some old frame is
sometimes shown..). But you might be right. The way I understood the original problem
might not be 100% correct as it's based on my assumption on how things were fixed
with addition of the helper. Then can you help me to approach to the real problem?
If it's really fixed by the helper, then what would the original issue be?

Maybe you need to do glFinish() on host after swapping buffers? But that
will block host for a long time.

I can try glFinish everytime after it draws a frame on the host. But..
If that is the issue, then wouldn't I expect some skipped frames, not old frames?

For now I don't have solution.

Virglrender today supports native contexts. The method you're using for
GPU priming was proven to be slow in comparison to multi-gpu native
contexts. There is ongoing work for supporting fence passing from guest
to host [1] that allows to do fence-syncing on host. You'll find links
to the WIP virtio-intel native context in [1] as well. You won't find
GPU priming support using native context in [1], patches hasn't been
published yet.

[1]
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138

Note that in general it's not acceptable to upstream patches that serve
downstream only. Yours display sync issue is irrelevant to the upstream
stack unless you're going to upstream all the VMM and guest userspace
patches, and in such case you should always publish all the patches and
provide links.

So, you need to check the performance impact and publish all the patches
to the relevant upstream projects.
QEMU has all patches regarding this (blob scanout support) but guest Mesa
patch for KMSRO is still outstanding.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

I understand this specific patch would need more
discussion/justification but
what about other two, are you generally ok with those in the same series?
The first patch should be dropped. The second one could be useful,

Yes, the first one will be gone. I will upload a revised version of the second one
only with proper explanation.

you'll need to provide step-by-step instruction for how to reproduce the
multi-display issue, please write it in the cover-letter for the next
patch version.

Thanks. Always appreciate your feedback!!