Am 12.12.2014 um 10:33 schrieb Daniel P. Berrange: > On Thu, Dec 11, 2014 at 10:06:40PM +0100, Richard Weinberger wrote: >> On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat <cbosdonnat@xxxxxxxx> wrote: >>> Some programs want to change some values for the network interfaces >>> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them >>> allows wicked to work on openSUSE 13.2+. >>> >>> In order to mount those folders RW but keep the rest of /proc/sys RO, >>> we add temporary mounts for these folders before bind-mounting >>> /proc/sys. Those mounts will be skipped if the container doesn't have >>> its own network namespace. >>> >>> It may happen that one of the temporary mounts in /proc/ filesystem >>> isn't available due to a missing kernel feature. We need not to fail >>> in that case. >> >> IMHO we should drop the read-only /proc mount completely. >> The idea behind having a read-only /proc was to make a container less >> insecure because user namespaces did not exist yet. > > Yep, read-only /proc was a (failed) attempt to predict the future - we > originally expected we'd need that even when user namespaces arrived, > but of course in the end it was a waste of time. Correct. Let's reduce this waste of time and don't add more code. :-) >> Now as user namespaces are mainline and considered stable we should >> start dropping such hacks >> instead of adding more of them. > > I'm trying to think if there are any backwards compatibility problems > if we got rid of read-only /proc but I can't imagine any app out there > is actively checked for a read-only /proc, so we'd probably be safe > to just switch it read-write. Same here. I'd be astonished if an application will break if you make /proc rw. BTW: While we are here, let's make /sys/ also rw. Again, if an application can do bad things, this is a plain kernel bug. >> As consequence of that libvirt has to decide what kind of container it >> wants to support. >> IMHO the only sane way is to enforce user namespaces to provide >> reasonable isolation. >> If an user can do bad things with a read-write /proc it need to be >> fixed in the kernel >> and not in libvirt. >> >> Containers without user namespaces and a root within are insecure and >> broken by design. > > Well addition of MAC can make them secure, but of course if you have > MAC, there's again no need to make /proc mount read-only. The MAC policy has to be *perfect* and has to use white listing. Also if you make your MAC too restrictive you'll break certain programs. You need more than just deny access to some magic files in /sys and /proc. If you deny for example mount(2) many applications will break, most notable systemd. I propose the following: a) Make /sys and /proc read-write b) If one create a container without and uid/g mapping print a big fat warning that such a container is not suitable for hostile guests. If the user has a specific use case where he can trust all guests, fine. But we have to document it clearly. Maybe a new config flag a la <i_know_what_i_m_doing/> would help too. ;-) Thanks, //richard -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list