Re: [PATCH] drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

Zack Rusin <zack.rusin@xxxxxxxxxxxx> · Tue, 30 Jan 2024 21:31:29 -0500

On Tue, Jan 30, 2024 at 6:50 PM Daniel Stone <daniel@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, 30 Jan 2024 at 18:39, Zack Rusin <zack.rusin@xxxxxxxxxxxx> wrote:
> > In general, yes. Of course it's a little more convoluted because we'll
> > act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
> > will fake page-flips because the only memory we'll have is a single
> > buffer as the actual page-flipping happens in the presentation code on
> > the host. So the guest is not aware of the actual presentation (it's
> > also why we don't have any sort of vblank signaling in vmwgfx, the
> > concept just doesn't exist for us). i.e. on para-virtualized drivers
> > the actual page-flips will be property of the presentation code that's
> > outside of the guest. It's definitely one those things that I wanted
> > to have a good solution for in a while, in particular to have a better
> > story behind vblank handling, but it's difficult because
> > "presentation" on vm's is in general difficult to define - it might be
> > some vnc connected host on the other continent. Having said that
> > that's basically a wonky VRR display so we should be able to handle
> > our presentation as VRR and give more control of updates to the guest,
> > but we haven't done it yet.
>
> Please don't.
>
> Photon time is _a_ useful metric, but only backwards-informational.
> It's nice to give userspace a good forward estimate of when pixels
> will hit retinas, but as it's not fully reliable, the main part is
> being able to let it know when it did happen so it can adjust. Given
> that it's not reliable, we can't use it as a basis for preparing
> submissions though, so we don't, even on bare-metal drivers.
>
> As you've noted though, it really falls apart on non-bare-metal cases,
> especially where latency vastly exceeds throughput, or when either is
> hugely variable. So we don't ever use it as a basis.
>
> VRR is worse though. The FRR model is 'you can display new content
> every $period, and here's your basis so you can calibrate phase'. The
> VRR model is 'you can display new content so rapidly it's not worth
> trying to quantise, just fire it as rapidly as possible'. That's a
> world away from 'errrr ... might be 16ms, might be 500? dunno really'.
>
> The entire model we have is that basis timing flows backwards. The
> 'hardware' gives us a deadline, KMS angles to meet that with a small
> margin, the compositor angles to meet that with a margin again, and it
> lines up client repaints to hit that window too. Everything works on
> that model, so it's not super surprising that using svga is - to quote
> one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.

That's very hurtful. Or it would be but of course you didn't believe
them because they're working on Weston so clearly don't make good
choices in general, right? The presentation on esxi is just as smooth
as it is by default on Ubuntu on new hardware...

> Given that the entire ecosystem is based on this model, I don't think
> there's an easy way out where svga just does something wildly
> different. The best way to fix it is to probably work on predictable
> quantisation with updates: pick 5/12/47/60Hz to quantise to based on
> your current throughput, with something similar to hotplug/LINK_STATUS
> and faked EDID to let userspace know when the period changes. If you
> have variability within the cycle, e.g. dropped frames, then just suck
> it up and keep the illusion alive to userspace that it's presenting to
> a fixed period, and if/when you calculate there's a better
> quantisation then let userspace know what it is so it can adjust.
>
> But there's really no future in just doing random presentation rates,
> because that's not the API anyone has written for.

See, my hope was that with vrr we could layer the weird remote
presentation semantics of virtualized guest on top of the same
infrastructure that would be used on real hardware. If you're saying
that it's not the way userspace will work, then yea, that doesn't
help. My issue, that's general for para-virtualized drivers, is that
any behavior that differs from hw drivers means that it's going to
break at some point, we see that even for basic things like the
update-layout hotplug events that have been largely standardized for
many years. I'm assuming that refresh-rate-changed will result in the
same regressions, but fwiw if I can implement FRR correctly and punt
any issues that arise due to changes in the FRR as issues in userspace
then that does make my life a lot easier, so I'm not going to object
to that.

z