Re: [PATCH v4 2/4] drm/panic: Add a drm panic handler

Jocelyn Falempe <jfalempe@xxxxxxxxxx> · Tue, 10 Oct 2023 15:24:51 +0200

On 10/10/2023 14:59, Daniel Vetter wrote:
On Tue, Oct 10, 2023 at 02:15:47PM +0200, Maxime Ripard wrote:
On Tue, Oct 10, 2023 at 01:29:52PM +0200, Noralf Trønnes wrote:

On 10/10/23 11:25, Maxime Ripard wrote:

On Tue, Oct 10, 2023 at 10:55:09AM +0200, Thomas Zimmermann wrote:
So if I understand correctly, drm_panic would pre-allocate a plane/commit,
and use that when a panic occurs ?

And have it checked already, yes. We would only wait for a panic to
happen to pull the trigger on the commit.

I have two concern about this approach:
- How much memory would be allocated for this ? a whole framebuffer can be
big for just this use case.

As I outlined in my email at [1], there are a number of different scenarios.
The question of atomic state and commits is entirely separate from the DRM
panic handler. We should not throw them together. Whatever is necessary is
get a scanout buffer, should happen on the driver side of
get_scanout_buffer, not on the drm_panic side.

[1] https://lore.kernel.org/dri-devel/39bd4c35-8a61-42ee-8675-ccea4f5d4ac6@xxxxxxx/T/#m22f116e9438e00a5f0a9dc43987d4153424f8be1

I'dd expect a whole framebuffer for the current
configuration/resolution. It would be typically 4MB for a full-HD system
which isn't a lot really and I guess we can always add an option to
disable the mechanism if needed.

- I find it risky to completely reconfigure the hardware in a panic handler.

I would expect to only change the format and base address of the
framebuffer. I guess it can fail, but it doesn't seem that different to
the async plane update we already have and works well.

The one thing I don't understand is: Why should we use atomic commits in the
first place? It doesn't make sense for firmware-based drivers.

Because this is generic infrastructure that is valuable for any drivers
and not only firmware-based drivers?

In some drivers, even the simple ast, we hold locks during the regular
commit. Trying to run the panic commit concurrently will likely give a
deadlock.

You're in the middle of a panic. Don't take any locks and you won't deadlock.

In the end it's a per-driver decision, but in most cases, the driver can
easily switch to a default mode with some ad-hoc code.

When was the last time a per-driver decision has been a good thing? I'm
sorry, but the get_scanout_buffer approach buffer won't work for any
driver out there.

I'm fine with discussing alternatives if you don't like the ones I
suggested, but they must allow the panic handler infrastructure to work
with any driver we have, not just 4.

Why can't we use the model[1] suggested by Daniel using a draw_pixel
callback giving drivers full control on how they can put a pixel on the
display?

I share kind of the same general ideas/conclusions: "qthe idea is that
all the fb selection and lookup is handled in shared code (and with
proper locking, but only for atomic drivers)."

2016 is a bit old though and multiple developments happened since
(secure playback is a thing now, framebuffer compression too), so I
still think that their expectation that the framebuffer is accessible to
/ writable by the CPU no longer holds true.

I think largely it should still be ok, because the idea behind the draw_xy
callback was that the driver could take care of anything special, like
- tiling
- clearing compression bits so that just writing the raw pixels works (if
   your compression format allows for that)
- handling any differences in how the pixels need to be written, like
   cache flushing, mmio_write vs normal memory, amd also has peek/poke
   registers to be able to write even into unmappable memory

What would probably be a good idea is to do an s/void */uinptr_t/ over my
interface proposal, or maybe an even more opaque cookie since really the
only thing you can do is get the void * that ->panic_vmap returns and pass
it into ->panic_draw_xy.

I thought (but I didn't dig through details) that the panic fb struct is
essentially meant to serve this purpose in the current series?

Yes, in this series there is a struct drm_scanout_buffer, that the 
driver has to provide when a panic occurs:

struct drm_scanout_buffer {
	const struct drm_format_info *format;
	struct iosys_map map;
	unsigned int pitch;
	unsigned int width;
	unsigned int height;
};

That works well for CPU accessible, linear format.
It should be possible to support YUV, or simple tiling with that, but 
for the more complex case, I proposed to add a draw_pixel() callback.

This will even work for the AMD debug interface.
In the linear CPU accessible buffer case, we can provide a helper for
that, maybe we can do helpers for other common cases as well.

Yeah, their idea of a panic_draw would work great for that.

Adding to that we would need a panic_setup/enter and panic_teardown/exit
callback.

What for?

So panic teardown would be for testing in CI, to make it non-destructive
and clean up anything panic_vmap (or _enter or whatever you call it) has
done. I thought John Oggness was also looking into how the new panic
handlers/consoles could be made testable in the panic paths, including
lock stealing and getting called from nmi.

Thanks, I was also wondering why a panic teardown would be necessary, 
since after the panic handler has run, the system will probably halt. 
(or maybe run kdump and reboot).

-Sima