Quoting Richard Guy Briggs (rgb@xxxxxxxxxx): > On 14/05/02, Serge E. Hallyn wrote: > > Quoting Richard Guy Briggs (rgb@xxxxxxxxxx): > > > I saw no replies to my questions when I replied a year after Aris' posting, so > > > I don't know if it was ignored or got lost in stale threads: > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html > > > https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html > > > (https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html) > > > https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html > > > > > > I've tried to answer a number of questions that were raised in that thread. > > > > > > The goal is not quite identical to Aris' patchset. > > > > > > The purpose is to track namespaces in use by logged processes from the > > > perspective of init_*_ns. The first patch defines a function to list them. > > > The second patch provides an example of usage for audit_log_task_info() which > > > is used by syscall audits, among others. audit_log_task() and > > > audit_common_recv_message() would be other potential use cases. > > > > > > Use a serial number per namespace (unique across one boot of one kernel) > > > instead of the inode number (which is claimed to have had the right to change > > > reserved and is not necessarily unique if there is more than one proc fs). It > > > could be argued that the inode numbers have now become a defacto interface and > > > can't change now, but I'm proposing this approach to see if this helps address > > > some of the objections to the earlier patchset. > > > > > > There could also have messages added to track the creation and the destruction > > > of namespaces, listing the parent for hierarchical namespaces such as pidns, > > > userns, and listing other ids for non-hierarchical namespaces, as well as other > > > information to help identify a namespace. > > > > > > There has been some progress made for audit in net namespaces and pid > > > namespaces since this previous thread. net namespaces are now served as peers > > > by one auditd in the init_net namespace with processes in a non-init_net > > > namespace being able to write records if they are in the init_user_ns and have > > > CAP_AUDIT_WRITE. Processes in a non-init_pid_ns can now similarly write > > > records. As for CAP_AUDIT_READ, I just posted a patchset to check capabilities > > > of userspace processes that try to join netlink broadcast groups. > > > > > > > > > Questions: > > > Is there a way to link serial numbers of namespaces involved in migration of a > > > container to another kernel? (I had a brief look at CRIU.) Is there a unique > > > identifier for each running instance of a kernel? Or at least some identifier > > > within the container migration realm? > > > > Eric Biederman has always been adamantly opposed to adding new namespaces > > of namespaces, so the fact that you're asking this question concerns me. > > I have seen that position and I don't fully understand the justification > for it other than added complexity. > > One way that occured to me to be able to identify a kernel instance was > to look at CPU serial numbers or other CPU entity intended to be > globally unique, but that isn't universally available. That's one issue, which is uniqueness of namespaces cross-machines. But it gets worse if we consider that after allowing in-container audit, we'll have a nested container running, then have the parent container migrated to another host (or just checkpointed and restarted); Now the nexted container's indexes will all be changed. Is there any way audit can track who's who after the migration? That's not an indictment of the serial # approach, since (a) we don't have in-container audit yet and (b) we don't have c/r/migration of nested containers. But it's worth considering whether we can solve the issue with serial #s, and, if not, whether we can solve it with any other approach. I guess one approach to solve it would be to allow userspace to request a next serial #. Which will immediately lead us to a namespace of serial #s (since the requested # might be lower than the last used one on the new host). As you've said inode #s for /proc/self/ns/* probably aren't sufficiently unique, though perhaps we could attach a generation # for the sake of audit. Then after a c/r/migration the generation # may be different, but we may have a better shot at at least using the same ino#. > Another possibility was RTC reading at time of boot, but that isn't good > enough either. > > Both are dubious in VMs anyways. > > > The way things are right now, since audit belongs to the init userns, > > we can get away with saying if a container 'migrates', the new kernel > > will see a different set of serials, and noone should care. However, > > if we're going to be allowing containers to have their own audit > > namespace/layer/whatever, then this becomes more of a concern. > > Having a container have its own audit daemon (partitionned appropriately > in the kernel) would be a long-term goal. Agreed, fwiw. > > That said, I'll now look at the patches while pretending that problem > > does not exist :) If I ack, it'll be on correctness of the code, but > > we'll still have to deal with this issue. > > Getting some discussion about this migration challenge was a significant > motivation for posting this patch, so I'm hoping others will weigh in. > > Thanks for your review, Serge. > > > > What additional events should list this information? > > > > > > Does this present any kind of information leak? Only CAP_AUDIT_CONTROL (and > > > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the > > > init namespace at the moment. > > > > > > > > > Proposed output format: > > > This differs slightly from Aristeu's patch because of the label conflict with > > > "pid=" due to including it in existing records rather than it being a seperate > > > record: > > > type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null) > > > > > > > > > Note: This set does not try to solve the non-init namespace audit messages and > > > auditd problem yet. That will come later, likely with additional auditd > > > instances running in another namespace with a limited ability to influence the > > > master auditd. I echo Eric B's idea that messages destined for different > > > namespaces would have to be tailored for that namespace with references that > > > make sense (such as the right pid number reported to that pid namespace, and > > > not leaking info about parents or peers). > > > > > > > > > Richard Guy Briggs (2): > > > namespaces: give each namespace a serial number > > > audit: log namespace serial numbers > > > > > > fs/mount.h | 1 + > > > fs/namespace.c | 1 + > > > include/linux/audit.h | 7 +++++++ > > > include/linux/ipc_namespace.h | 1 + > > > include/linux/nsproxy.h | 8 ++++++++ > > > include/linux/pid_namespace.h | 1 + > > > include/linux/user_namespace.h | 1 + > > > include/linux/utsname.h | 1 + > > > include/net/net_namespace.h | 1 + > > > init/version.c | 1 + > > > ipc/msgutil.c | 1 + > > > ipc/namespace.c | 2 ++ > > > kernel/audit.c | 38 ++++++++++++++++++++++++++++++++++++++ > > > kernel/nsproxy.c | 24 ++++++++++++++++++++++++ > > > kernel/pid.c | 1 + > > > kernel/pid_namespace.c | 2 ++ > > > kernel/user.c | 1 + > > > kernel/user_namespace.c | 2 ++ > > > kernel/utsname.c | 2 ++ > > > net/core/net_namespace.c | 4 +++- > > > 20 files changed, 99 insertions(+), 1 deletions(-) > > > > > > _______________________________________________ > > > Containers mailing list > > > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > https://lists.linuxfoundation.org/mailman/listinfo/containers > > - RGB > > -- > Richard Guy Briggs <rbriggs@xxxxxxxxxx> > Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat > Remote, Ottawa, Canada > Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers