śr., 11 paź 2023 o 11:42 Daniel Vetter <daniel@xxxxxxxx> napisał(a): > > On Wed, Oct 11, 2023 at 11:48:16AM +0300, Pekka Paalanen wrote: > > On Tue, 10 Oct 2023 10:06:02 -0600 > > jim.cromie@xxxxxxxxx wrote: > > > > > since I name-dropped you all, > > > > Hi everyone, > > > > I'm really happy to see this topic being developed! I've practically > > forgot about it myself, but the need for it has not diminished at all. > > It's good to hear you guys are also interested in this feature. > > I didn't understand much of the conversation, so I'll just reiterate > > what I would use it for, as a Wayland compositor developer. > > > > I added a few more cc's to get better coverage of DRM and Wayland > > compositor developers. > > > > > On Tue, Oct 10, 2023 at 10:01 AM <jim.cromie@xxxxxxxxx> wrote: > > > > > > > > On Mon, Oct 9, 2023 at 4:47 PM Łukasz Bartosik <lb@xxxxxxxxxxxx> wrote: > > > > ... > > > > > > > I don't have a real life use case to configure different trace > > > > > instance for each callsite. > > > > > I just tried to be as much flexible as possible. > > > > > > > > > > > > > Ive come around to agree - I looked back at some old threads > > > > (that I was a part of, and barely remembered :-} > > > > > > > > At least Sean Paul, Lyude, Simon Ser, Pekka Paalanen > > > > have expressed a desire for a "flight-recorder" > > > > it'd be hard to say now that 2-3 such buffers would always be enough, > > > > esp as theres a performance reason for having your own. > > > > A Wayland compositor has roughly three important things where the kernel > > debugs might come in handy: > > - input > > - DRM KMS > > - DRM GPU rendering > > > > DRM KMS is the one I've been thinking of in the flight recorder context > > the most, because KMS hardware varies a lot, and there is plenty of > > room for both KMS drivers and KMS userspace to go wrong. The usual > > result is your display doesn't work, so the system is practically > > unusable to the end user. In the wild, the simplest or maybe the only > > way out of that may be a reboot, maybe an automated one (e.g. digital > > signage). In order to debug such problems, we would need both > > compositor logs and the relevant kernel debug messages. > > > > For example, Weston already has a flight recorder framework of its own, > > so we have the compositor debug logs. It would be useful to get the > > selected kernel debug logs in the same place. It could be used for > > automated or semi-manual bug reporting, for example, making the > > administrator or end user life much easier reporting issues. > > > > Since this is usually a production environment, and the Wayland > > compositor runs without root privileges, we need something that works > > with that. We would likely want the kernel debug messages in the > > compositor to combine and order them properly with the compositor debug > > messages. > > > > It's quite likely that developers would like to pick and choose which > > kernel debug messages might be interesting enough to record, to avoid > > excessive log flooding. The flight recorder in Weston is fixed size to > > avoid running out of memory or disk space. I can also see that Weston > > could have debugging options that affect which kernel debug messages it > > subscribes to. We can have a reasonable default setup that allows us to > > pinpoint the problem area and figure out most problems, and if needed, > > we could ask the administrator pass another debug option to Weston. It > > helps if there is just one place to configure everything about the > > compositor. > > > > This implies that it would be really nice to have userspace subscriber > > specific debug message streams from the kernel, or a good way to filter > > the messages we want. A Wayland compositor would not be interested in > > file system or wireless debugs for example, but another system > > component might be. There is also a security aspect of which component is > > allowed to see which messages in case they could contain anything > > sensitive (input debug could contain typed passwords). > > > > Configuring the kernel debug message selection for our debug message > > stream can and probably should require elevated privileges, and we can > > likely solve that in userspace with a daemon or such, to allow the > > Wayland compositor to run as a regular user. > > > > Thinking of desktop systems, and especially physically multi-seat systems: > > - there can be multiple different Wayland compositors running simultaneously > > - each of them may want debug messages only from a specific DRM KMS > > device instance, and not from others > > - each of them may have a different idea of which debug messages are important > > - because DRM KMS leasing is a thing, different compositor instances > > could be using the same DRM KMS device instance simultaneously; since > > this is specific to DRM KMS, and it should be harmless to get a > > little too much DRM KMS debug (that is, from the whole device instead > > of just the leased parts), it may not be worth to consider splitting > > debug message streams this far. > > > > If userspace is offered some standardised fields in kernel debug > > structures, then userspace could do some filtering on its own too, but I > > guess it would be better to filter at the source and not need that. > > > > There is also an anti-goal. The kernel debug message contents are > > specifically not machine-parsable. I very much do not want to impose > > debug strings as ABI, they are for human (and AI?) readers only. > > > > > > As a summary, here are the most important requirements first: > > - usable in production as a normal thing to enable always by default > > - final delivery to unprivileged userspace process > > I think this is the one that's trickiest, and I don't fully understand why > you need it. The issues I'm seeing: > > - logs tend to leak a lot of kernel internal state that's useful for > attacks. There's measures for the worst (like obfuscating kernel > pointers by hashing them), so there's always going to be a difference > here between what full sysadmin can get and what unpriviledged userspace > can get. And there's always a risk we miss something that we should > obfuscate but didn't. > > - handing this to userspace increases the risks it becomes uapi. Who's > going to stop compositors from sussing out the reason an atomic commit > failed from the logs if they can get them easily, and these logs contain > very interesting information about why something failed? > > Much better if journald or a crash handler assemebles all the different > flight recorder logs and packages it into a bug report so that the > compositor cannot ever get at these directly. Yeah this needs some OS > support with a dbus request or similar so that the compositor can ask > for a crash dump with everything relevant to its session. > This is similar to what we plan to do in ChromeOS. We want to enable writing debug logs of each subsystem/driver of interest (e.g. thunderbolt) into a separate trace instance and when an issue happens and a user submits a feedback report we will attach the captured logs from trace instances to the report. > - the idea of an in-kernel flight recorder is that it's really fast. The > entire tracing infra is built such that recording an event is really > quick, but printing it is not - the entire string formatting is delayed > to when userspace reads the buffers. If you constantly push the log > messages to userspace we toss the advantage of the low-overhead > in-kernel flight recorder. If you push logs to dmesg there's a > substantial measureable overhead which you don't really want in > production, and your requirement would impose the same. > > - I'm not sure how this is supposed to mesh with userspace log aggregators > like journald when every compositor has it's own flight recorder on top. > Feels a bit like a solution that ignores the entire os stack and assumes > that the only pieces we can touch are the kernel and the compositor to > get to such a flight recorder. > > You might object that events will get out-of-order if you merge multiple > logs after the fact, but that's the case anyway if we use the tracing > framework since that's always a ringbuffer within the kernel and not > synchronous. The only thing we could do is allow userspace to push > markers into that ringbuffer, which is easily done by adding more debug > output lines (heck we could even add a logging cookie to certain ioctl > when userspace really cares about knowing exact ordering of it's > requests with the stuff the kernel does). > > - If you really want direct deliver to userspace I guess we could do > something where sessiond opens the flight recorder fd for you, sets it > all up and passes it to the compositor. But I'm really not a big fan of > sending the full kms dbg spam to compositors to freely digest in real > time. > > > - per debug-print selection of messages (finer or coarser, categories > > within a kernel sub-system could be enough) > > - per originating device (driver instance) selection of messages > > The dyndbg stuff can do all that already, which is why I'm so much in > favour of relying on that framework. > > > - all selections tailored separately for each userspace subscriber > > (- per open device file description selection of messages) > > Again this feels like a userspace problem. Sessions could register what > kind of info they need for their session, and something like journald can > figure out how to record it all. > > If you want the kernel to keep separate flight recorders I guess we could > add that, but I don't think it currently exists for the dyndbg stuff at > least. Maybe a flight recorder v2 feature, once the basics are in. > > > That's my idea of it. It is interesting to see how far the requirements > > can be reasonably realised. > > I think aside from the "make it available directly to unpriviledged > userspace" everything sounds reasonable and doable. > > More on the process side of things, I think Jim is very much looking for > acks and tested-by by people who are interested in better drm logging > infra. That should help that things are moving in a direction that's > actually useful, even when it's not yet entirely complete. > > Cheers, Sima > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch