On Wed, Oct 11, 2023 at 11:48:16AM +0300, Pekka Paalanen wrote: > On Tue, 10 Oct 2023 10:06:02 -0600 > jim.cromie@xxxxxxxxx wrote: > > > since I name-dropped you all, > > Hi everyone, > > I'm really happy to see this topic being developed! I've practically > forgot about it myself, but the need for it has not diminished at all. > > I didn't understand much of the conversation, so I'll just reiterate > what I would use it for, as a Wayland compositor developer. > > I added a few more cc's to get better coverage of DRM and Wayland > compositor developers. > > > On Tue, Oct 10, 2023 at 10:01 AM <jim.cromie@xxxxxxxxx> wrote: > > > > > > On Mon, Oct 9, 2023 at 4:47 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote: > > ... > > > > > I don't have a real life use case to configure different trace > > > > instance for each callsite. > > > > I just tried to be as much flexible as possible. > > > > > > > > > > Ive come around to agree - I looked back at some old threads > > > (that I was a part of, and barely remembered :-} > > > > > > At least Sean Paul, Lyude, Simon Ser, Pekka Paalanen > > > have expressed a desire for a "flight-recorder" > > > it'd be hard to say now that 2-3 such buffers would always be enough, > > > esp as theres a performance reason for having your own. > > A Wayland compositor has roughly three important things where the kernel > debugs might come in handy: > - input > - DRM KMS > - DRM GPU rendering > > DRM KMS is the one I've been thinking of in the flight recorder context > the most, because KMS hardware varies a lot, and there is plenty of > room for both KMS drivers and KMS userspace to go wrong. The usual > result is your display doesn't work, so the system is practically > unusable to the end user. In the wild, the simplest or maybe the only > way out of that may be a reboot, maybe an automated one (e.g. digital > signage). In order to debug such problems, we would need both > compositor logs and the relevant kernel debug messages. > > For example, Weston already has a flight recorder framework of its own, > so we have the compositor debug logs. It would be useful to get the > selected kernel debug logs in the same place. It could be used for > automated or semi-manual bug reporting, for example, making the > administrator or end user life much easier reporting issues. > > Since this is usually a production environment, and the Wayland > compositor runs without root privileges, we need something that works > with that. We would likely want the kernel debug messages in the > compositor to combine and order them properly with the compositor debug > messages. > > It's quite likely that developers would like to pick and choose which > kernel debug messages might be interesting enough to record, to avoid > excessive log flooding. The flight recorder in Weston is fixed size to > avoid running out of memory or disk space. I can also see that Weston > could have debugging options that affect which kernel debug messages it > subscribes to. We can have a reasonable default setup that allows us to > pinpoint the problem area and figure out most problems, and if needed, > we could ask the administrator pass another debug option to Weston. It > helps if there is just one place to configure everything about the > compositor. > > This implies that it would be really nice to have userspace subscriber > specific debug message streams from the kernel, or a good way to filter > the messages we want. A Wayland compositor would not be interested in > file system or wireless debugs for example, but another system > component might be. There is also a security aspect of which component is > allowed to see which messages in case they could contain anything > sensitive (input debug could contain typed passwords). > > Configuring the kernel debug message selection for our debug message > stream can and probably should require elevated privileges, and we can > likely solve that in userspace with a daemon or such, to allow the > Wayland compositor to run as a regular user. > > Thinking of desktop systems, and especially physically multi-seat systems: > - there can be multiple different Wayland compositors running simultaneously > - each of them may want debug messages only from a specific DRM KMS > device instance, and not from others > - each of them may have a different idea of which debug messages are important > - because DRM KMS leasing is a thing, different compositor instances > could be using the same DRM KMS device instance simultaneously; since > this is specific to DRM KMS, and it should be harmless to get a > little too much DRM KMS debug (that is, from the whole device instead > of just the leased parts), it may not be worth to consider splitting > debug message streams this far. > > If userspace is offered some standardised fields in kernel debug > structures, then userspace could do some filtering on its own too, but I > guess it would be better to filter at the source and not need that. > > There is also an anti-goal. The kernel debug message contents are > specifically not machine-parsable. I very much do not want to impose > debug strings as ABI, they are for human (and AI?) readers only. > > > As a summary, here are the most important requirements first: > - usable in production as a normal thing to enable always by default > - final delivery to unprivileged userspace process I think this is the one that's trickiest, and I don't fully understand why you need it. The issues I'm seeing: - logs tend to leak a lot of kernel internal state that's useful for attacks. There's measures for the worst (like obfuscating kernel pointers by hashing them), so there's always going to be a difference here between what full sysadmin can get and what unpriviledged userspace can get. And there's always a risk we miss something that we should obfuscate but didn't. - handing this to userspace increases the risks it becomes uapi. Who's going to stop compositors from sussing out the reason an atomic commit failed from the logs if they can get them easily, and these logs contain very interesting information about why something failed? Much better if journald or a crash handler assemebles all the different flight recorder logs and packages it into a bug report so that the compositor cannot ever get at these directly. Yeah this needs some OS support with a dbus request or similar so that the compositor can ask for a crash dump with everything relevant to its session. - the idea of an in-kernel flight recorder is that it's really fast. The entire tracing infra is built such that recording an event is really quick, but printing it is not - the entire string formatting is delayed to when userspace reads the buffers. If you constantly push the log messages to userspace we toss the advantage of the low-overhead in-kernel flight recorder. If you push logs to dmesg there's a substantial measureable overhead which you don't really want in production, and your requirement would impose the same. - I'm not sure how this is supposed to mesh with userspace log aggregators like journald when every compositor has it's own flight recorder on top. Feels a bit like a solution that ignores the entire os stack and assumes that the only pieces we can touch are the kernel and the compositor to get to such a flight recorder. You might object that events will get out-of-order if you merge multiple logs after the fact, but that's the case anyway if we use the tracing framework since that's always a ringbuffer within the kernel and not synchronous. The only thing we could do is allow userspace to push markers into that ringbuffer, which is easily done by adding more debug output lines (heck we could even add a logging cookie to certain ioctl when userspace really cares about knowing exact ordering of it's requests with the stuff the kernel does). - If you really want direct deliver to userspace I guess we could do something where sessiond opens the flight recorder fd for you, sets it all up and passes it to the compositor. But I'm really not a big fan of sending the full kms dbg spam to compositors to freely digest in real time. > - per debug-print selection of messages (finer or coarser, categories > within a kernel sub-system could be enough) > - per originating device (driver instance) selection of messages The dyndbg stuff can do all that already, which is why I'm so much in favour of relying on that framework. > - all selections tailored separately for each userspace subscriber > (- per open device file description selection of messages) Again this feels like a userspace problem. Sessions could register what kind of info they need for their session, and something like journald can figure out how to record it all. If you want the kernel to keep separate flight recorders I guess we could add that, but I don't think it currently exists for the dyndbg stuff at least. Maybe a flight recorder v2 feature, once the basics are in. > That's my idea of it. It is interesting to see how far the requirements > can be reasonably realised. I think aside from the "make it available directly to unpriviledged userspace" everything sounds reasonable and doable. More on the process side of things, I think Jim is very much looking for acks and tested-by by people who are interested in better drm logging infra. That should help that things are moving in a direction that's actually useful, even when it's not yet entirely complete. Cheers, Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch