RE: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

"Zhang, Tina" <tina.zhang@xxxxxxxxx> · Mon, 2 Aug 2021 04:48:47 +0000

> -----Original Message-----
> From: Daniel Vetter <daniel@xxxxxxxx>
> Sent: Friday, July 30, 2021 6:26 PM
> To: Kasireddy, Vivek <vivek.kasireddy@xxxxxxxxx>
> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; Daniel Vetter <daniel@xxxxxxxx>; Gerd
> Hoffmann <kraxel@xxxxxxxxxx>; Pekka Paalanen <ppaalanen@xxxxxxxxx>;
> Simon Ser <contact@xxxxxxxxxxx>; Michel Dänzer <michel@xxxxxxxxxxx>;
> Zhang, Tina <tina.zhang@xxxxxxxxx>; Kim, Dongwon
> <dongwon.kim@xxxxxxxxx>
> Subject: Re: [RFC v1 0/4] drm: Add support for
> DRM_CAP_DEFERRED_OUT_FENCE capability
> 
> On Thu, Jul 29, 2021 at 01:16:55AM -0700, Vivek Kasireddy wrote:
> > By separating the OUT_FENCE signalling from pageflip completion allows
> > a Guest compositor to start a new repaint cycle with a new buffer
> > instead of waiting for the old buffer to be free.
> >
> > This work is based on the idea/suggestion from Simon and Pekka.
> >
> > This capability can be a solution for this issue:
> > https://gitlab.freedesktop.org/wayland/weston/-/issues/514
> >
> > Corresponding Weston MR:
> > https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/668
> 
> Uh I kinda wanted to discuss this a bit more before we jump into typing code,
> but well I guess not that much work yet.
> 
> So maybe I'm not understanding the problem, but I think the fundamental
> underlying issue is that with KMS you can have at most 2 buffers in-flight,
> due to our queue depth limit of 1 pending flip.
> 
> Unfortunately that means for virtual hw where it takes a few more
> steps/vblanks until the framebuffer actually shows up on screen and is
> scanned out, we suffer deeply. The usual fix for that is to drop the latency
> and increase throughput, and have more buffers in-flight. Which this patch
> tries to do.
> 
> Now I think where we go wrong here is that we're trying to hack this up by
> defining different semantics for the out-fence and for the drm-event. Imo
> that's wrong, they're both meant to show eactly the same thing:
> - when is the new frame actually visible to the user (as in, eyeballs in a
>   human head, preferrably, not the time when we've handed the buffer off
>   to the virtual hw)
> - when is the previous buffer no longer being used by the scanout hw
> 
> We do cheat a bit right now in so far that we assume they're both the same,
> as in, panel-side latency is currently the compositor's problem to figure out.
> 
> So for virtual hw I think the timestamp and even completion really need to
> happen only when the buffer has been pushed through the entire
> virtualization chain, i.e. ideally we get the timestamp from the kms driver
> from the host side. Currently that's not done, so this is most likely quite
> broken already (virtio relies on the no-vblank auto event sending, which
> definitely doesn't wait for anything, or I'm completely missing something).

Agree. One lesson we got from previous direct-display related work is that using host hardware event is kind of "must". Otherwise, problems like flickering or tearing or frame drop will come out. Besides, as the wayland-ui is working as a weston client, it needs to have more than 2 buffers to support the full-frame redraw. I tried the Weston-simple-dmabuf-egl with 2 buffers and it was bad.

BR,
Tina

> 
> I think instead of hacking up some ill-defined 1.5 queue depth support, what
> we should do is support queue depth > 1 properly. So:
> 
> - Change atomic to support queue depth > 1, this needs to be a per-driver
>   thing due to a bunch of issues in driver code. Essentially drivers must
>   never look at obj->state pointers, and only ever look up state through
>   the passed-in drm_atomic_state * update container.
> 
> - Aside: virtio should loose all it's empty hooks, there's no point in
>   that.
> 
> - We fix virtio to send out the completion event at the end of this entire
>   pipeline, i.e. virtio code needs to take care of sending out the
>   crtc_state->event correctly.
> 
> - We probably also want some kind of (maybe per-crtc) recommended queue
>   depth property so compositors know how many buffers to keep in flight.
>   Not sure about that.
> 
> It's a bit more work, but also a lot less hacking around infrastructure in
> dubious ways.
> 
> Thoughts?
> 
> Cheers, Daniel
> 
> >
> > Cc: Daniel Vetter <daniel@xxxxxxxx>
> > Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx>
> > Cc: Pekka Paalanen <ppaalanen@xxxxxxxxx>
> > Cc: Simon Ser <contact@xxxxxxxxxxx>
> > Cc: Michel Dänzer <michel@xxxxxxxxxxx>
> > Cc: Tina Zhang <tina.zhang@xxxxxxxxx>
> > Cc: Dongwon Kim <dongwon.kim@xxxxxxxxx>
> >
> > Vivek Kasireddy (4):
> >   drm: Add a capability flag to support deferred out_fence signalling
> >   virtio-gpu uapi: Add VIRTIO_GPU_F_OUT_FENCE feature
> >   drm/virtio: Add VIRTIO_GPU_CMD_RESOURCE_OUT_FENCE cmd
> >   drm/virtio: Probe and implement VIRTIO_GPU_F_OUT_FENCE feature
> >
> >  drivers/gpu/drm/drm_file.c               | 11 +++---
> >  drivers/gpu/drm/drm_ioctl.c              |  3 ++
> >  drivers/gpu/drm/virtio/virtgpu_debugfs.c |  1 +
> >  drivers/gpu/drm/virtio/virtgpu_drv.c     |  1 +
> >  drivers/gpu/drm/virtio/virtgpu_drv.h     |  6 ++++
> >  drivers/gpu/drm/virtio/virtgpu_fence.c   |  9 +++++
> >  drivers/gpu/drm/virtio/virtgpu_kms.c     | 10 ++++--
> >  drivers/gpu/drm/virtio/virtgpu_plane.c   | 44 +++++++++++++++++++++++-
> >  drivers/gpu/drm/virtio/virtgpu_vq.c      | 17 +++++++++
> >  include/drm/drm_mode_config.h            |  9 +++++
> >  include/uapi/drm/drm.h                   |  1 +
> >  include/uapi/linux/virtio_gpu.h          | 12 +++++++
> >  12 files changed, 117 insertions(+), 7 deletions(-)
> >
> > --
> > 2.30.2
> >
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch