RFC - DRM-based UAPI for Dynamic Mux Switching

Daniel Dadap <ddadap@xxxxxxxxxx> · Tue, 8 Nov 2022 12:31:59 -0600

Many dual-GPU notebook systems are equipped with a multiplexer ("mux")
to switch the signal source for the internal display panel between the
discrete and integrated GPUs. The vga-switcheroo infrastructure in the
Linux kernel can expose mux switch functionality to userspace via a
simple debugfs file; however, vga-switcheroo has several limitations
which make it unsuitable for some newer systems which support switching
the mux while the panel is actively driven (henceforth refered to as
"dynamic" mux switching), with the the intention to migrate display
responsibility from one GPU to the other. A new userspace API for mux
switching can overcome these limitations, while providing a cleaner
interface for DRM clients to interact with muxable displays.

Hardware which is capable of dynamic mux switching is now generally
available, labeled with the marketing name "NVIDIA Advanced Optimus":

https://www.nvidia.com/en-us/geforce/news/rtx-laptops-advanced-optimus/

Internally, we have been experimenting with implementing dynamic mux
switch capabilities on Linux for some time. An early X11/GLX prototype
was developed as internal proof of concept, but it relies on several
tricks to hide details about the mux switch from the X server and GLX
clients. A more recent prototype which extends vga-switcheroo and
updates the existing in-tree DRM-KMS drivers to use the new
vga-switcheroo extensions revealed the need for userspace to be more
actively involved in the mux switch process, leading to the design
proposal that follows:

Background: Limitations of vga-switcheroo

The current vga-switcheroo design has several limitations which are not
compatible with dynamic mux switching. The biggest of these is that the
system seems to be designed around the assumption that mux switches are
only possible when neither GPU is actively driving the display. The API
specifies a can_switch() callback function that "client" (i.e., GPU)
drivers register with vga-switcheroo so that they can report whether a
switch is possible. Currently, every implementation of this callback
will report that switching is possible IFF the reference count of active
modesetting clients is zero. There are currently three types of switch
events defined in the existing vga-switcheroo UAPI:

1. "Normal" mux switches (this switch type doesn't really have a name):
   Switch the mux immediately. This switch type calls the can_switch()
   callback and will fail if either client driver reports that it is
   busy and switching is not possible. If the client driver does not
   report that it handles power control, vga-switcheroo will power on
   the switched-to device and power off the switched-away-from device
   via the registered set_gpu_state() callbacks.
2. Delayed mux switches:
   Register the intent to switch the mux. The switch will be executed
   once there are no more active modesetting clients.
3. Mux-only switches:
   Switch the mux immediately, without checking the can_switch()
   callback, and without attempting to manipulate the power state of
   either GPU.

On the surface, the mux-only switch type seems like it might be suitable
for dynamic switches; however, this type of switch does not call *any*
of the client callbacks, so the client drivers are not made aware of the
mux switch event and cannot respond appropriately.

In particular, the reprobe() callback (which seems to be unimplemented
in every driver except for Nouveau) would be useful to alert the
switched-to GPU that it should reprobe the connected displays, e.g. to
retrain the link on eDP panels. Another callback that would be useful
for dynamic mux switching would be a new callback for enabling and
disabling Panel Self Refresh (PSR): this would be called on the
switched-away-from GPU to enable PSR before switching the mux, then
called on the switched-to GPU to disable PSR after switching the mux,
to enable a flicker-free transition from one GPU to the other.

vga-switcheroo is also built on the assumption that a system will have
at most one graphics mux, and that the mux is connected to exactly two
GPUs, one of them integrated and the other discrete. While this is the
most common configuration for GPU-muxed systems by far, there are some
existing designs with more than one mux, and it limits the possibile
configurations for future designs. Notably, existing systems with two
muxes (one for the internal panel, and another for an external display
connector) expose the muxes as a single logical mux which switches both
muxes at the same time.

These assumptions and limitations made sense for what was possible with
X.org at the time vga-switcheroo was designed: specifically, that there
can only ever be one GPU at a time with functioning 3D graphics, but in
this age of GLVND and PRIME Render Offloading, many multi-vendor use
cases are now possible which previously were not. While it is possible
to extend the existing vga-switcheroo subsystem to support dynamic
switching, and while vga-switcheroo could probably use a new UAPI anyway
(the current debugfs UAPI is unavailable if debugfs is disabled or
restricted, e.g. by the lockdown "confidentiality" mode), coupling mux
switching more tightly with DRM may make things simpler for both DRM-KMS
userspace clients and DRM-KMS modesetting drivers.

Proposal: Make mux switching part of the DRM-KMS atomic modeset UAPI

As the current vga-switcheroo UAPI is exposed via a debugfs file, this
means that switching the mux must be controlled separately from and in
addition to other work a mux-switching client might need to do in order
to prepare for and react to a mux switch. Presently, no particular
component is expected to be  responsible for driving the mux, and the
mux can be switched from anywhere in userspace via the debugfs file.
This is fine when mux switches are only possible while neither GPU is
actively driving a graphical display session, but poses a problem when
mux switches become possible while there are still actively running
modesetting clients. In addition to notifying the GPU drivers of the
event via the vga-switcheroo callback mechanism, the modesetting clients
would also need to be aware of, if not involved in, the mux switch
event.

Since the active modesetting clients already need to have the context of
what is being displayed on which display, it makes sense to assign the
role of dynamic mux switch control to the modesetting clients (e.g. an X
server or a Wayland compositor). And since the modesetting clients will
likely already want to stop displaying on the switched-away-from
connector and then start displaying on the switched-to connector, it
makes sense to attach the mux switch request to the modeset requests
that would already be necessary.

A proposal for driving dynamic mux switches via a DRM-centric
alternative to the existing vga-switcheroo model follows:

* Rather than registering the mux targets at GPU-level granularity (each
  vga-switcheroo client is associated with a pci_device), they could be
  registered at connector-level granularity by the DRM-KMS drivers.
* Each connector would be associated with an individual mux. Any mux
  which has two or more associated connectors will be capable of
  switching between them.
* The association of a connector with a particular mux would be exposed
  via an immutable drm_connector property. This will allow clients to
  ascertain which connectors share a mux, and can therefore be switched
  between. This property would be an opaque-to-userspace handle
  identifying a particular mux. All connectors which share a mux will
  have the same handle value reflected in the value of this property.
* An additional immutable drm_connector property will indicate an index
  for each display on the mux. The combination of a mux handle plus this
  connector index is globally unique within the system. The connector
  index values are unique to each connector within a given mux, but are
  not unique across muxes.
* A read/write drm_connector property will indicate the connector which
  is currently muxed when read, and will request a switch to another
  connector on the same mux when written.

This would allow for a DRM-KMS client to initiate a mux switch as
follows (some details about error handling and validation checks are
omitted for brevity):

1. The client prepares and validates an atomic modeset request which
   sets the "switched-to display" property on the switched-away-from
   connector to the value of the "mux value" property on the switched-to
   connector.
2. The client commits the switch-away-from request.
3. The switched-away-from KMS driver enables PSR on the pending
   switched-away-from connector. The client may stop presenting to the
   switched-away-from connector after this is complete.
4. DRM core calls into the mux driver which registered with the
   connector to switch the mux to the switched-to connector.
5. The switched-to DRM driver reprobes the connector and can now detect
   the previously disconnected display.
6. The client prepares and validates an atomic modeset request setting a
   mode on the switched-to connector.
7. The client commits the modeset request on the switched-to connector.
8. The switched-to KMS driver disables PSR on the switched-to connector.
9. Both atomic modesets are complete, and the client may begin
   presenting on the switched-to connector.

The atomic modeset requests come in a switch-away-from and switch-to
pair, since it is likely that the switched-away-from and switched-to
connectors would be on more than one drm_device. In the theoretical case
of the swiched-away-from and switched-to connectors being on the same
drm_device, it should theoretically be possible to perform a single
commit if the KMS driver is capable of validating the mode request on
the switched-to connector while it is still disconnected.

This design also provides natural synchronization points for operations
that the DRM-KMS drivers will need to take before and after the mux
switch, eliminating the need for dedicated callbacks for setting PSR
state and reprobing the displays. This will allow much of the process
to reuse existing code in the various DRM-KMS drivers which will need to
support dynamic mux switching, minimizing the amount of changes required
to existing in-tree and out-of-tree DRM-KMS drivers. The modesetting
client has complete control over and visibility into the state of each
muxable connector, and it is not possible for arbitrary userspace
programs to switch the mux. The design also allows for an arbitrary
number of muxes, each one able to switch between an arbitrary number of
connectors. It is even possible to have a mux switch between multiple
connectors on the same GPU.

Additional possible features:

In addition to the proposed drm_connector-based mux switching UAPI,
additional functionality can be handled by this framework (and could
also be added to vga-switcheroo, via appropriate extensions):

* EDID dispatch - vga-switcheroo supports switching only the DDC lines
  on older mux designs where this was possible, to allow the driver for
  the inactive GPU to read the panel's EDID while switched away. Modern
  notebook designs use eDP panels, and therefore cannot switch out DDC
  in isolation. This may not actually be necessary if the switched-to
  DRM-KMS driver is going to reprobe the connector before the client
  prepares an atomic request for the switched-to connector.
* HDA dispatch - GPU HDA drivers register with vga-switcheroo, but
  mainly for power management purposes. Since vga-switcheroo doesn't
  support switching an active desktop session from one GPU to another,
  users can simply switch to using the newly connected GPU's HDA
  controller after the switch. However, for dynamic switching it may be
  desirable to have a single consistent HDA device across both sides of
  the switch. This is probably easier said than done, but since this
  really isn't an issue for internal panels, which tend not to have
  GPU-driven audio, and there are relatively few designs with dynamic
  muxes for external displays, this doesn't need to be a particularly
  high priority issue.

Some problems that need solving:

* There would need to be some way to validate, with the first commit on
  the switched-away-from connector's device, that the client also has
  permission to set a mode on the switched-to connector's device. One
  way to do this might be to expose an immutable magic cookie property
  on each muxed connector, and set the magic cookie value of the target
  connector when requesting the mux switch. However, it might also be
  useful to have a more robust generic solution for cross-device
  authentication for other use
  cases that require it.
* It is unclear what the best method would be for discovering the which
  mux is associated with which connectors. For the simplistic and common
  case of one mux which can switch to connect one internal panel between
  either of two GPUs, this is trivial, but there is no obvious solution
  for the more general case. In the current prototype, the mux handler
  driver registers a callback which gets called once for each connector
  when DRM-KMS drivers register their connectors, and then assigns the
  mux ID for each connector that it is responsible for. This is similar
  to the get_client_id() handler callback in the existing vga-switcheroo
  design.
* Details for how to handle console restoration, e.g. if a client
  crashes while mux-switched away from the device driving the fbcon,
  should be worked out.
* The client will ideally be able to query the primary planes for the
  connector which is about to be switched to, before the switch occurs.
  Perhaps this could be done by having the target DRM-KMS driver allow
  planes to be queried on a disconnected connector if it is associated
  with a mux, or at the very least expose a list of supported formats
  that is accessible regardless of display connection status. In either
  case, some amount of foreknowledge would be required on the part of
  the to-be-switched-to connector's DRM driver, at least for displays
  which are attached to a mux.
* There should be a way to request switches for external muxes which do
  not currently have a display connected. DRM-KMS modesets don't really
  make sense for this case, since there's nothing to set a mode on.

One of the design goals in this proposal was to avoid introducing new
dedicated DRM APIs, which is why the functionality is expressed via
connector properties. However, it may be difficult to solve some of
these problems (particularly the cross-driver permission one) without
new APIs.