Hi Beau, On Thu, 1 Jun 2023 09:29:21 -0700 Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: > > > These are stubs to integrate namespace support. I've been working on a > > > series that adds a tracing namespace support similiar to the IMA > > > namespace work [1]. That series is ending up taking more time than I > > > > Look, this is all well and nice but you've integrated user events with > > tracefs. This is currently a single-instance global filesystem. So what > > you're effectively implying is that you're namespacing tracefs by > > hanging it off of struct user namespace making it mountable by > > unprivileged users. Or what's the plan? > > > > We don't have plans for unprivileged users currently. I think that is a > great goal and requires a proper tracing namespace, which we currently > don't have. I've done some thinking on this, but I would like to hear > your thoughts and others on how to do this properly. We do talk about > this in the tracefs meetings (those might be out of your time zone > unfortunately). > > > That alone is massive work with _wild_ security implications. My > > appetite for exposing more stuff under user namespaces is very low given > > the amount of CVEs we've had over the years. > > > > Ok, I based that approach on the feedback given in LPC 2022 - Containers > and Checkpoint/Retore MC [1]. I believe you gave feedback to use user > namespaces to provide the encapsulation that was required :) Even with the user namespace, I think we still need to provide separate "eventname-space" for each application, since it may depend on the context who and where it is launched. I think the easiest solution is (perhaps) providing a PID-based new groups for each instance (the PID-prefix or suffix will be hidden from the application). I think it may not good to allow unprivileged user processes to detect the registered event name each other by default. > > > > anticipated. > > > > Yet you were confident enough to leave the namespacing stubs for this > > functionality in the code. ;) > > > > What is the overall goal here? Letting arbitrary unprivileged containers > > define their own custom user event type by mounting tracefs inside > > unprivileged containers? If so, what security story is going to > > guarantee that writing arbitrary tracepoints from random unprivileged > > containers is safe? > > > > Unprivileged containers is not a goal, however, having a per-pod > user_event system name, such as user_event_<pod_name>, would be ideal > for certain diagnostic scenarios, such as monitoring the entire pod. That can be done in the user-space tools, not in the kernel. > When you have a lot of containers, you also want to limit how many > tracepoints each container can create, even if they are given access to > the tracefs file. The per-group can limit how many events/tracepoints > that container can go create, since we currently only have 16-bit > identifiers for trace_event's we need to be cautious we don't run out. I agree, we need to have a knob to limit it to avoid DoS attack. > user_events in general has tracepoint validators to ensure the payloads > coming in are "safe" from what the kernel might do with them, such as > filtering out data. [...] > > > changing the system name of user_events on a per-namespace basis. > > > > What is the "system name" and how does it protect against namespaces > > messing with each other? > > trace_events in the tracing facility require both a system name and an > event name. IE: sched/sched_waking, sched is the system name, > sched_waking is the event name. For user_events in the root group, the > system name is "user_events". When groups are introduced, the system > name can be "user_events_<GUID>" for example. So my suggestion is using PID in root pid namespace instead of GUID by default. Thank you, -- Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>