Re: How to handle disconnection of eDP panels due to dynamic display mux switches

Daniel Dadap <ddadap@xxxxxxxxxx> · Tue, 31 Mar 2020 20:59:39 -0500

On 3/30/20 10:11 AM, Jani Nikula wrote:
On Fri, 27 Mar 2020, Daniel Dadap <ddadap@xxxxxxxxxx> wrote:
A number of hybrid GPU notebook computer designs with dual (integrated
plus discrete) GPUs are equipped with multiplexers (muxes) that allow
display panels to be driven by either the integrated GPU or the discrete
GPU. Typically, this is a selection that can be made at boot time as a
menu option in the system firmware's setup screen, and the mux selection
stays fixed for as long as the system is running and persists across
reboots until it is explicitly changed. However, some muxed hybrid GPU
systems have dynamically switchable muxes which can be switched while
the system is running.

NVIDIA is exploring the possibility of taking advantage of dynamically
switchable muxes to enhance the experience of using a hybrid GPU system.
For example, on a system configured for PRIME render offloading, it may
be possible to keep the discrete GPU powered down and use the integrated
GPU for rendering and displaying the desktop when no applications are
using the discrete GPU, and dynamically switch the panel to be driven
directly by the discrete GPU when render-offloading a fullscreen
application.

We have been conducting some experiments on systems with dynamic muxes,
and have found some limitations that would need to be addressed in order
to support use cases like the one suggested above:

* In at least the i915 DRM-KMS driver, and likely in other DRM-KMS
drivers as well, eDP panels are assumed to be always connected. This
assumption is broken when the panel is muxed away, which can cause
problems. A typical symptom is i915 repeatedly attempting to retrain the
link, severely impacting system performance and printing messages like
the following every five seconds or so:

[drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link
training
[drm] Reducing the compressed framebuffer size. This may lead to less
power savings than a non-reduced-size. Try to increase stolen memory
size if available in BIOS.

This symptom might occur if something causes the DRM-KMS driver to probe
the display while it's muxed away, for example a modeset or DPMS state
change.

* When switching the mux back to a GPU that was previously driving a
mode, it is necessary to at the very least retrain DP links to restore
the previously displayed image. In a proof of concept I have been
experimenting with, I am able to accomplish this from userspace by
triggering DPMS off and then back on again; however, it would be good to
have an in-kernel API to request that an output owned by a DRM-KMS
driver be refreshed to resume driving a mode on a disconnected and
reconnected display. This API would need to be accessible from outside
of the DRM-KMS driver handling the output. One reason it would be good
to do this within the kernel, rather than rely on e.g. DPMS operations
in the xf86-video-modesetting driver, is that it would be useful for
restoring the console if X crashes or is forcefully killed while the mux
is switched to a GPU other than the one which drives the console.

Basically, we'd like to be able to do the following:

1) Communicate to a DRM-KMS driver that an output is disconnected and
can't be used. Ideally, DRI clients such as X should still see the
output as being connected, so user applications don't need to keep track
of the change.
I think everything will be much easier if you provide a way for
userspace to control the muxing using the KMS API, and not lie to the
userspace about what's going on.

You're not actually saying what component you think should control the
muxing.

Why should the drivers keep telling the userspace the output is
connected when it's not? Obviously the userspace should also switch to
using a different output on a different GPU, right? Or are you planning
some proprietary behind the scenes hack for discrete?

The desire to lie to userspace is driven mainly by trying to avoid 
interactions from desktop environments / window managers reacting to the 
display going away. Many desktops will do things like try to migrate 
windows in response to a change in the current display configuration, 
and updating all of them to avoid doing so when a display appears to 
disappear from one GPU and reappear on another GPU seems harder than 
allowing userspace to believe that nothing has changed. I wouldn't mind 
if e.g. X drivers were in on the lie, and the lie boundary shifts to 
RandR, but it would be nice to avoid having to deal with the fallout of 
desktop environments handling displays apparently vanishing and 
re-appearing.

The particular use case we're envisioning here is as follows:

* GPU A drives an X protocol screen which hosts a desktop session.
Applications are rendered on GPU A by default. The mux is switched to 
GPU A by default.
* GPU B drives a GPU screen that can be used as a PRIME render offload 
source. Applications rendered on GPU B can run in windows presented by 
GPU A via PRIME render offloading.
* If an application rendered on GPU B and presented on GPU A becomes 
fullscreen, the mux can switch to GPU B and GPU B can present the 
application directly for as long as the application remains in the 
foreground and fullscreen.
* The mux switches back to GPU A and the application presents via GPU A 
and render offloading if it transitions to a window or another window 
occludes it.

I think DRI3 render offload works a bit differently, but hopefully the 
high-level concept is somewhat applicable to that framework as well.

As for what should be controlling the muxing, I suppose that depends on 
what you mean by controlling:

If you mean controlling the mux device itself, that should be a platform 
driver that offers an API to execute the mux switch itself. The existing 
vga-switcheroo framework would be a natural fit, but it would need some 
substantial changes in order to support this sort of use case. I've 
described some of the challenges we've observed so far in my response to 
Daniel Vetter.

If you mean what should drive the policy of when automatic mux switches 
occur, it would have to be something that is aware of what at least one 
of the GPUs is displaying. It could be one of the GPU drivers, or a 
client of the GPU drivers, e.g. X11 or a Wayland compositor.

For the proof of concept experiments we are currently conducting, both 
of these roles are currently performed by components of the NVIDIA 
proprietary GPU stack, but the functionality could be moved to another 
component (e.g. vga-switcheroo, X11, server-side GLVND, ???) if the 
necessary functionality becomes supported in the future.

BR,
Jani.

2) Request that a mode that was previously driven on a disconnected
output be driven again upon reconnection.

If APIs to do the above are already available, I wasn't able to find
information about them. These could be handled as separate APIs, e.g.,
one to set connected/disconnected state and another to restore an
output, or as a single API, e.g., signal a disconnect or reconnect,
leaving it up to the driver receiving the signal to set the appropriate
internal state and restore the reconnected output. Another possibility
would be an API to disable and enable individual outputs from outside of
the DRM-KMS driver that owns them. I'm curious to hear the thoughts of
the DRM subsystem maintainers and contributors on what the best approach
to this would be.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel