Am 08.01.2015 um 14:02 schrieb Daniel P. Berrange: > We have historically done a number of things with LXC that are > somewhat questionable in retrospect > > 1. Mounted /proc/sys read-only, but then mounted > /proc/sys/net/ipv* read-write again > 2. Mounted /sys read only > 3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN > 4. FUSE mount on /proc/meminfo > > Items 1 & 2 are pointless as they offer no security benefit either > with or without user namespaces. Without userns it is always insecure, > with userns it is always secure, no matter what the mount state is. I agree. Thanks a lot for addressing this, Daniel! > Item 3 is some what dubious, since /proc/self/cgroup paths for > processes are now not visible at /sys/fs/cgroup. This really > confuses systemd inside the container making it create a broken > layout The question is, how to support systemd in containers? As of now I'm not aware of a working concept. With current libvirt it kind of works but recently I found a very nasty issue: See: https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html Maybe with cgroup namespaces it works. i.e. such that systemd can mount cgroupfs within the container in a secure way. The current discussion can be found here: https://lkml.org/lkml/2015/1/7/150 As of now I have to drop all my systemd lxc guests and will replace them by a non-systemd distro, which is very sad. :-( > Item 4 is some what dubious, since we're only changing some of the > fields in /proc/meminfo. It helps apps which blindly parse > /proc/meminfo to determine free system resources they can consume. > Those apps are broken even without containers being involved though, > since any application must expect to be placed inside a cgroup with > limited resources. Faking /proc/meminfo is a pretty limited workaround > that just delays the inevitable fixing of such apps.. You mean that tools like free(1) have to be patched to query also memory limits from cgroupfs? > The patch that follows just removes the items 1 & 2, but I'm thinking > we should go further and remove items 3 & 4 too. > > Changing 4 in particular though is certainly classed as a guest ABI > change though, so is not something distros may wish to see when > upgrading libvirt. There is scope to argue that 1-3 are guest ABI > changes too > > In full machine virt world, we deal with this using machine types. > eg each new KVM version introduces a new machine type which models > the guest ABI in a stable fashion. Guest machine types are fixed at > time of first deployment. So when libvirt / KVM is upgraded, existing > guests will not see any changes, but new guests will automatically > get the new machine type. > > I'm thinking we might want make use of this in LXC before making > these changes. eg introduce a new machine 'libvirt-lxc-1' to > represent the current guest mount setup and make sure all existing > guests get that machine type. Then introduce a new machine type > libvirt-lxc-2 that removes all this cruft, which new guests will > get by default. > > Alternatively we could call them 'libvirt-lxc-compat-1' and > 'libvirt-lxc-bare-1' to give a clearer indication of their > functional difference and version them separately in the future ? Can we have a new machine type which enforces user namespaces? > Regards, > Daniel > > Daniel P. Berrange (1): > lxc: Stop mouning /proc and /sys read only > > src/lxc/lxc_container.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) Acked-by: Richard Weinberger <richard@xxxxxx> Thanks, //richard -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list