Quoting Gao feng (gaofeng@xxxxxxxxxxxxxx): > On 08/26/2013 11:19 AM, James Bottomley wrote: > > On Mon, 2013-08-26 at 09:06 +0800, Gao feng wrote: > >> On 08/26/2013 02:16 AM, James Bottomley wrote: > >>> On Sun, 2013-08-25 at 19:37 +0200, Kay Sievers wrote: > >>>> On Sun, Aug 25, 2013 at 7:16 PM, James Bottomley > >>>> <jbottomley@xxxxxxxxxxxxx> wrote: > >>>>> On Wed, 2013-08-21 at 11:51 +0200, Kay Sievers wrote: > >>>>>> On Wed, Aug 21, 2013 at 9:22 AM, Gao feng <gaofeng@xxxxxxxxxxxxxx> wrote: > >>>>>>> On 08/21/2013 03:06 PM, Eric W. Biederman wrote: > >>>>>> > >>>>>>>> I suspect libvirt should simply not share /run or any other normally > >>>>>>>> writable directory with the host. Sharing /run /var/run or even /tmp > >>>>>>>> seems extremely dubious if you want some kind of containment, and > >>>>>>>> without strange things spilling through. > >>>>>> > >>>>>> Right, /run or /var cannot be shared. It's not only about sockets, > >>>>>> many other things will also go really wrong that way. > >>>>> > >>>>> This is very narrow thinking about what a container might be and will > >>>>> cause trouble as people start to create novel uses for containers in the > >>>>> cloud if you try to impose this on our current infrastructure. > >>>>> > >>>>> One of the cgroup only container uses we see at Parallels (so no > >>>>> separate filesystem and no net namespaces) is pure apache load balancer > >>>>> type shared hosting. In this scenario, base apache is effectively > >>>>> brought up in the host environment, but then spawned instances are > >>>>> resource limited using cgroups according to what the customer has paid. > >>>>> Obviously all apache instances are sharing /var and /run from the host > >>>>> (mostly for logging and pid storage and static pages). The reason some > >>>>> hosters do this is that it allows much higher density simple web serving > >>>>> (either static pages from quota limited chroots or dynamic pages limited > >>>>> by database space constraints) because each "instance" shares so much > >>>>> from the host. The service is obviously much more basic than giving > >>>>> each customer a container running apache, but it's much easier for the > >>>>> hoster to administer and it serves the customer just as well for a large > >>>>> cross section of use cases and for those it doesn't serve, the hoster > >>>>> usually has separate container hosting (for a higher price, of course). > >>>> > >>>> The "container" as we talk about has it's own init, and no, it cannot > >>>> share /var or /run. > >>> > >>> This is what we would call an IaaS container: bringing up init and > >>> effectively a new OS inside a container is the closest containers come > >>> to being like hypervisors. It's the most common use case of Parallels > >>> containers in the field, so I'm certainly not telling you it's a bad > >>> idea. > >>> > >>>> The stuff you talk about has nothing to do with that, it's not > >>>> different from all services or a multi-instantiated service on the > >>>> host sharing the same /run and /var. > >>> > >>> I gave you one example: a really simplistic one. A more sophisticated > >>> example is a PaaS or SaaS container where you bring the OS up in the > >>> host but spawn a particular application into its own container (this is > >>> essentially similar to what Docker does). Often in this case, you do > >>> add separate mount and network namespaces to make the application > >>> isolated and migrateable with its own IP address. The reason you share > >>> init and most of the OS from the host is for elasticity and density, > >>> which are fast becoming a holy grail type quest of cloud orchestration > >>> systems: if you don't have to bring up the OS from init and you can just > >>> start the application from a C/R image (orders of magnitude smaller than > >>> a full system image) and slap on the necessary namespaces as you clone > >>> it, you have something that comes online in miliseconds which is a feat > >>> no hypervisor based virtualisation can match. > >>> > >>> I'm not saying don't pursue the IaaS case, it's definitely useful ... > >>> I'm just saying it would be a serious mistake to think that's the only > >>> use case for containers and we certainly shouldn't adjust Linux to serve > >>> only that use case. > >>> > >> > >> The feature you said above VS contianer-reboot-host bug, I prefer to > >> fix > >> the bug. > > > > What bug? > > > >> and this feature can be achieved even container unshares /run > >> directory > >> with host by default, for libvirt, user can set the container > >> configuration to > >> make the container shares the /run directory with host. > >> > >> I would like to say, the reboot from container bug is more urgent and > >> need > >> to be fixed. > > > > Are you talking about the old bug where trying to reboot an lxc > > container from within it would reboot the entire system? > > Yes, we are discussing this problem in this whole thread. > > If so, OpenVZ > > has never suffered from that problem and I thought it was fixed > > upstream. I've not tested lxc tools, but the latest vzctl from the > > openvz website will bring up a container on the vanilla 3.9 kernel > > (provided you have USER_NS compiled in) can also be used to reboot the > > container, so I see no reason it wouldn't work for lxc as well. > > > > I'm using libvirt lxc not lxc-tools. > Not all of users enable user namespace, I trust these container management > tools can have right/proper setting which inhibit this reboot-problem occur. > but I don't think this reboot-problem won't happen in any configuration. On any recent kernel, reboot syscall from inside a non-init pid-ns will not reboot the host. If from within a non-init pid-ns you are managing to reboot the host, then you have a problem with how userspace is set up. The container is being allowed to request init on the host to do the reboot - ie by sharing /dev/initctl inode with the host, or by being in same net namespace as upstart on the host. The fact that it's possible to create such containers is not a bug. (On older kernels, you have to drop CAP_SYS_BOOT to prevent use of reboot system call, as all lxc-like programs did.) -serge -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list