On Friday, May 10, 2024 9:11:13 AM EDT Jonas �ahl wrote:
> On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> > Hi
> >
> > > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > >
> > > There are two problems at hand; one is the race condition during boot
> > > when the login screen (or whatever display server appears first) is
> > > launched with simpledrm, only some moments later having the real GPU
> > > driver appear.
> > >
> > > The other is general purpose GPU hotplugging, including the unplugging
> > > the GPU decided by the compositor to be the primary one.
> >
> > The situation of booting with simpledrm (problem 2) is a special case of
> > problem 1. From the kernel's perspective, unloading simpledrm is the same as
> > what you call general purpose GPU hotplugging. Even through there is not a
> > full GPU, but a trivial scanout buffer. In userspace, you see the same
> > sequence of events as in the general case.
>
> Sure, in a way it is, but the consequence and frequency of occurence is
> quite different, so I think it makes sense to think of them as different
> problems, since they need different solutions. One is about fixing
> userspace components support for arbitrary hotplugging, the other for
> mitigating the race condition that caused this discussion to begin with.
>
> >
> > >
> > > The latter is something that should be handled in userspace, by
> > > compositors, etc, I agree.
> > >
> > > The former, however, is not properly solved by userspace learning how to
> > > deal with primary GPU unplugging and switching to using a real GPU
> > > driver, as it'd break the booting and login experience.
> > >
> > > When it works, i.e. the race condition is not hit, is this:
> > >
> > > * System boots
> > > * Plymouth shows a "splash" screen
> > > * The login screen display server is launched with the real GPU driver
> > > * The login screen interface is smoothly animating using hardware
> > > accelerating, presenting "advanced" graphical content depending on
> > > hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > >
> > > If the race condition is hit, with a compositor supporting primary GPU
> > > hotplugging, it'll work like this:
> > >
> > > * System boots
> > > * Plymouth shows a "splash" screen
> > > * The login screen display server is launched with simpledrm
> > > * Due to using simpldrm, the login screen interface is not animated and
> > > just plops up, and no "advanced" graphical content is enabled due to
> > > apparent missing hardware capabilities
> > > * The real GPU driver appears, the login screen now starts to become
> > > animated, and may suddenly change appearance due to capabilties
> > > having changed
> > >
> > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > > still end up with a glitchy boot experience, and it forces userspace to
> > > add things like sleep(10) to work around this.
> > >
> > > In other words, fixing userspace is *not* a correct solution to the
> > > problem, it's a work around (albeit a behaivor we want for other
> > > reasons) for the race condition.
> >
> > To really fix the flickering, you need to read the old DRM device's atomic
> > state and apply it to the new device. Then tell the desktop and applications
> > to re-init their rendering stack.
> >
> > Depending on the DRM driver and its hardware, it might be possible to do
> > this without flickering. The key is to not loose the original scanout
> > buffer, while not probing the new device driver. But that needs work in each
> > individual DRM driver.
>
> This doesn't sound like it'll fix any flickering as I describe them.
> First, the loss of initial animation when the login interface appears is
> not something one can "fix", since it has already happened.
>
I feel like whatever animations that a login screen has though is going to be
in the realm of a fade-in animation, or maybe a sliding animation though, or
one of those that are more on the simple side.
llvmpipe should be good enough for animations like that these days I would
think, right? Or is it really bad on very very old CPUs, like say a Pentium III?
> Avoiding flickering when switching to the new driver is only possible
> if one limits oneself to what simpledrm was capable of doing, i.e. no
> HDR signaling etc.
>
> >
> > >
> > > Arguably, the only place a more educated guess about whether to wait or
> > > not, and if so how long, is the kernel.
> >
> > As I said before, driver modules come and go and hardware devices come and
> > go.
> >
> > To detect if there might be a native driver waiting to be loaded, you can
> > test for
> >
> > - 'nomodeset' on the command line -> no native driver
>
> Makes sense to not wait here, and just assume simpledrm forever.
>
> > - 'systemd-load-modules' not started -> maybe wait
> > - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ ->
> > maybe wait
>
> I suspect this is not useful for general purpose distributions. I have
> 43 kernel GPU modules there, on a F40 installation.
>
> > - maybe udev can tell you more
> > - it might for detection help that recently simpledrm devices refer to their
> > parent PCI device
> > - maybe systemd tracks the probed devices
>
> If the kernel already plumbs enough state so userspace components can
> make a decent decision, instead of just sleeping for an arbitrary amount
> of time, then great. This is to some degree what
> https://github.com/systemd/systemd/issues/32509 is about.
>
>
> Jonas
>
> >
> > Best regards
> > Thomas
> >
> > >
> > >
> > > Jonas
> > >
> > > > The next best solution is to keep the final DRM device open until a new one
> > > > shows up. All DRM graphics drivers with hotplugging support are required to
> > > > accept commands after their hardware has been unplugged. They simply won't
> > > > display anything.
> > > >
> > > > Best regards
> > > > Thomas
> > > >
> > > >
> > > > > Thanks
> > > > >
> >
>
>