Re: Handling DRM master transitions cooperatively

Pekka Paalanen <ppaalanen@xxxxxxxxx> · Tue, 7 Sep 2021 13:07:46 +0300

On Fri, 3 Sep 2021 21:08:21 +0200
Dennis Filder <d.filder@xxxxxx> wrote:

> Hans de Goede asked me to take a topic from a private discussion here.
> I must also preface that I'm not a graphics person and my knowledge of
> DRI/DRM is cursory at best.
> 
> I initiated the conversation with de Goede after learning that the X
> server now supports being started with an open DRM file descriptor
> (this was added for Keith Packard's xlease project).  I wondered if
> that could be used to smoothen the Plymouth->X transition somehow and
> asked de Goede if there were any such plans.  He denied, but mentioned
> that a new ioctl is in the works to prevent the kernel from wiping the
> contents of a frame buffer after a device is closed, and that this
> would help to keep transitions smooth.

Hi,

I believe the kernel is not wiping anything on device close. If
something in the KMS state is wiped, it originates in userspace:

- Plymouth doing something (e.g. RmFB on an in-use FB will turn the
  output off, you need to be careful to "leak" your FB if you want a
  smooth hand-over)

- Xorg doing something (e.g. resetting instead of inheriting KMS state)

- Something missed in the hand-off sequence which allows fbcon to
  momentarily take over between Plymouth and Xorg. This would need to
  be fixed between Plymouth and Xorg.

- Maybe systemd-logind does something odd to the KMS device? It has
  pretty wild code there. Or maybe it causes fbcon to take over.

What is the new ioctl you referred to?

Being able to be started with an open DRM file descriptor is not
necessary for a smooth hand-over as far as I know. There are tons of
other details that are, though.

> 
> I am a bit disappointed with this being considered a desirable way of
> handling that transfer of control over a shared DRM device as it shows
> a lack of ambition.  Sure, it's probably easy to implement, but it

Or more likely bigger fires and lack of time, like with everything.

> will also greatly limit how such transitions can be presented to the
> user.  In practice it would mean plymouthd closing the DRM device and
> exiting so that systemd can start the display manager which then
> starts an X server to present the login screen.  If for that several
> shared libraries have to first be loaded and relocated while the
> system is under heavy load then there will be a noticeable delay
> manifesting as a frozen screen.  After that the best you can hope for
> is blending the still-frame over into the login screen (or whatever
> comes then).  The VT-API-based switching mechanism currently en vogue
> suffers from similar limitations.

All that is already solvable purely in userspace in my opinion, today.
It's just a big project over several independent userspace software
projects.

> If the approach to transferring control were to be changed to a scheme
> that involves both donor and recipient process connecting to each
> other on a unix socket and actively coordinating the transfer
> (i.e. the calls to drmSetMaster and drmDropMaster) then this would
> open the door to a host of possibilities.  Not only could the
> transition be kept infinitesimally short since both processes are
> already up, but it could also involve e.g. the recipient continuing an
> animation the donor had going reusing state that is transferred as a
> memfd.  This way there wouldn't be any noticeable freezes on the
> display making for a far more polished, and thus impressive
> experience.  It would be a feat a program alone cannot achieve on its
> own.  Another option made possible would be implementing a watchdog.
> If the recipient transfers e.g. file descriptors for a pipe and a
> pidfd of itself, then the donor could monitor those for a
> heartbeat/process termination and take back control over the device if
> something goes awry (deadlock/crash) and initiate a recovery
> mechanism.  With the other approach implementing such features is
> simply not possible.

Nothing in the kernel stops userspace developers from doing exactly
that. Seems like you would be working on Plymouth and a display server
of your choice. Don't forget to count in systemd-logind as well, since
that is a popular component for managing sessions and is involved with
e.g. drmSetMaster.

It's a good goal.

It's also more or less necessary for a smooth hand-over of a KMS device
between any two processes also for other reasons I've discussed in the
past with DRM developers. This is the topic about any KMS client (a
program using KMS) needing to reach a guaranteed "clean" KMS state to
display correctly.

The kernel DRM subsystem never resets KMS state in any way, apart from
driver initialisation.

This means that when a new KMS client takes over, the KMS state could
be anything, whatever the previous KMS client left in. This is a
problem, because the KMS client may not know how to reset all the KMS
properties to clean, sane defaults. Currently there is also no reset
ioctl in the kernel either, and no userspace space solution for storing
a sane default state. The problem arises from KMS properties: each KMS
client may not know how to program all the KMS properties the kernel
supports on the device. For example, if one KMS client leaves the
output in HDR mode, and the next KMS client does not understand the HDR
property, then quite likely the latter KMS client will display an awful
image without knowing it.

There is also the convention of a KMS client not restoring the
inherited KMS state on exit or switch-out, because that could cause
unnecessary flicker on screen and delays. This amplifies the above
problem.

The only time the KMS state is at "sane" defaults is right after driver
initialisation. Presumably. So only the first KMS client after a reboot
can expect a sane KMS state.

Mind, also fbcon is a KMS client, it's just a kernel-internal one.
Letting fbcon take over momentarily can reset some KMS properties to
nice defaults, but not all, because when new KMS properties are added,
no-one usually remembers to patch fbcon/fbdev/whatever to reset it when
fbcon takes over. Switching temporarily to fbcon also causes flicker
and delays.

The above problems could perhaps be solved if there was a generic KMS
hand-over protocol in userspace. The two KMS clients could agree on
which KMS properties should be reset and by whom. Who understands and
programs which KMS properties. And perhaps, some system component in
early'ish boot could save the driver-initial KMS state which is
presumably the good default for any use case, on the properties any
particular KMS client does not know or bother to program by itself.

So yes, userspace protocol for KMS hand-off would be very welcome. But
who would have the time to develop it, when so far we can just limp
forward with the current undocumented conventions.

Such protocol, if widely used, might also make it unnecessary for KMS
clients to save and restore KMS properties they do not understand when
they switch out and later back in. When a KMS client has released
DRM master on the device, some other KMS client could have "messed up"
the KMS state, so restoring to what you used before is necessary. I
don't think anyone actually implements this save/restore yet for
unknown KMS properties, and it would be much easier to implement than
the hand-off protocol. Maybe switching is either not done, or it is
always done to/from a display manager process which sanitises the KMS
state or enforces that KMS clients do not leave random state behind. Or
maybe most KMS clients are just really good at agreeing on which KMS
properties everyone will use. If some stale KMS property causes a
problem on some KMS client (display server), it is pretty easy to just
add support for programming that property in the KMS client. Problem
solved and no hand-off protocol needed. Then the next KMS client hits
and does the same.

> Making processes talk to each other and work together like this would
> also be a far more accurate software representation of what is
> actually going on: different subsystems passing control over a shared
> device around to work towards the common goal of a good user
> experience.
> 
> A bit of context: The idea underlying this came from my experience
> with accessibility technology under Linux where uncoordinated fighting
> over the audio device among all kinds of processes led to countless
> ways in which things would break with no hope of ever fixing anything.
> It instilled in me the conviction that user-facing programs are broken
> if they are not written to talk to each other to coordinate access to
> shared resources for the goal of rendering a good user experience, but
> instead leave it to the distro maintainer/user to set things up into a
> static, brittle working order.  Seeing a much-needed cultural shift
> begin somewhere would be nice.  The Plymouth->X transition would lend
> itself well as a starting point since many building blocks are already
> there.

I might recommend picking a Wayland display server instead of Xorg
first. The thing is, Xorg is only a middle-man and it is some X11
client that decides what and how will be displayed. Therefore with X11
architecture, it's not just two processes that need to communicate the
hand-off, it's three: Plymouth, Xorg, and the X11 client that will
actually draw stuff.

In Wayland architecture, you only need to communicate between Plymouth
and the Wayland compositor you picked. How that Wayland compositor
draws anything is an internal detail to it, so you can solve that
in-project.

You could also think that how Xorg gets the content is an internal
detail to the X11 desktop, but that might lead to needing some new X11
protocol extension to be able to control Xorg's actions on the KMS
device sufficiently. It may also be hard to get any new feature code
into Xorg and released.

Thanks,
pq
Attachment:
pgpgLn9MZX6va.pgp

Description: OpenPGP digital signature