Serge E. Hallyn wrote: > Quoting Daniel Lezcano (daniel.lezcano@xxxxxxx): > >> Serge E. Hallyn wrote: >> >>> Quoting Jean-Marc Pigeon (jmp@xxxxxxx): >>> >>>> Hello, >>>> >>>> >>>> >>>>> I was wondering out loud about the best design to solve his problem. >>>>> >>>>> If we try to redirect kernel-generated messages to containers, we have >>>>> several problems, including whether we need to duplicate the messages >>>>> to the host container. So in one sense it seems more flexible to >>>>> 1. send everything to host syslog >>>>> >>>> No, if we do that all CONTs message will reach >>>> the same bucket and it will be difficult to sort >>>> them out.. >>>> CONT sys_admin and HOST sys_admin could be different >>>> "entity", so you debug CONT config and critical >>>> needed information reach HOST (which you do not have access >>>> to). >>>> >>> Yes, so a privileged task on HOST must pass that information back to >>> you on CONT. That is not a valid complaint imo. But how to sort the >>> msgs out is a valid question. >>> >>> We need some sort of identifier, unique system-wide, attached to.. something. >>> Is ifindex unique system-wide right now? Oh, IIRC it is, but we wnat it to >>> be containerized, so that would be a bad choice :) >>> >>> >>>>> 2. clamp down on syslog use by processes not in the init_user_ns >>>>> >>>> Could give me more detail??... >>>> >>> Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg) >>> altogether from a container, or to only allow reading/writing messages >>> to own syslog. (I had hoped to find time to try out the second option but >>> simply haven't had the time, and it doesn't look like I will very soon. >>> So if anyone else wants to, pls jump at it...) >>> >>> Then /proc/kmsg can provide what I described above through a FUSE file, >>> and if, as you mentioned, the container unmounts the FUSE fs and gets >>> to real procfs, they just get nothing. >>> >>> >>>>> 3. let the userspace on the host copy messages into a socket or >>>>> file so child container can pretend it has real syslog. >>>>> >>>> So you trap printk message from CONT on the HOST and >>>> redirect them on CONT but on a standard syslog channel. >>>> Seem OK to me, as long /proc/kmsg is not existing >>>> (/dev/null) in the CONT file tree. >>>> >> We have: >> * Commands to sys_syslog: >> * >> * 0 -- Close the log. Currently a NOP. >> * 1 -- Open the log. Currently a NOP. >> * 2 -- Read from the log. >> * 3 -- Read all messages remaining in the ring buffer. >> * 4 -- Read and clear all messages remaining in the ring buffer >> * 5 -- Clear ring buffer. >> * 6 -- Disable printk to console >> * 7 -- Enable printk to console >> * 8 -- Set level of messages printed to console >> * 9 -- Return number of unread characters in the log buffer >> * 10 -- Return size of the log buffer >> >> And add: >> * 11 -- create a new ring buffer for the current process >> and its childs >> >> >> We have, let's say a global ring buffer keep untouched, used by >> syslog(2) and printk. When we create a new ring buffer, we allocate >> it and assign to the nsproxy (global ring buffer is the default in >> the nsproxy). >> >> The prink keeps writing in the global ring buffer and the syslog(2) >> writes to the "namespaced" ring buffer. >> >> Does it makes sense ? >> > > Yeah, it's a nice alternative. Though (1) there is something to be said for > forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the > new ring buffer is pointed to from nsproxy, it might be frowned upon to do > an unshare/clone action in yet another way. > Why do you want to tie clone(CLONE_NEWUSER) with a new ring buffer ? I mean one may want to use CLONE_NEWUSER but keep the ring buffer, no ? > I still think our first concern should be safety, and that we should consider > just adding 'struct syslog_struct' to nsproxy, and making that NULL on a > clone(CLONE_NEWUSER). any sys_syslog() or /proc/kmsg access returns -EINVAL > after that. Then we can discuss whether and how to target printks to > namespaces, and whether duplicates should be sent to parent namespaces. > That makes sense to do it step by step. Targeting the printk is the more difficult, no ? I mean you should have always the destination namespace available which is not obvious when the printk is called from an interrupt context. > After we start getting flexible with syslog, the next request will be for > audit flexibility. I don't even know how our netlink support suffices for > that right now. > > (So, this all does turn into a big deal...) > Mmh ... right. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers