On Mon, 2015-09-07 at 13:23 +0100, Daniel P. Berrange wrote: > On Thu, Sep 03, 2015 at 11:51:16AM +0200, Cédric Bosdonnat wrote: > > We already have a fuse mount to reflect the cgroup memory restrictions > > in the container. This commit adds the same for the number of available > > CPUs. Only the CPUs listed by virProcessGetAffinity are shown in the > > container's cpuinfo. > > So this (re-)raises some interesting / difficult questions that I'm > not sure we have a good answer to. > > The main concern is that actually this is not really a problem specific > to containers, rather it is related to cgroup resource confinement. > ie the cgroup has confined a process(es) to a set of CPUs are the process > is using /proc/cpuinfo to count CPUs and so is wrong. Cgroups are being > increasingly widely used in Linux, particularly since systemd, so pretty > much any process has to expect that it can be confined to a subset of > CPUs. I agree. > IOW, any application using /proc/cpuinfo to determine "available" > resource is already broken, even when run on bare metal. The same also > applies to the use of /proc/meminfo, which we previously faked via > fuse. > > So the question is whether we should invest time trying to fake the > /proc/cpuinfo in containers, when any apps we'd be fixing are already > broken in bare metal. Apps might have avoided /proc/cpuinfo and instead > be trying /sys/devices/system/cpu/ which your patch isn't trying to > fake. This is just as broken, because sysfs doesn't reflect cgroup > confinement either. I agree /sys/devices/system/cpu should be patched too... but it contains much more subtle things to handle. At least I don't have a good enough knowledge of that FS to fake it properly. > I think what is ultimately needed for applications is some kind of > libresource.so library that they can use to query what resources > are available in their compute environment, which can intelligently > query cgroups directly, and ignore the legacy /proc & /sys interfaces > for counting memory / cpu availability. I don't think that's something > that libvirt should solve - if anything it could be systemd, or a > standalone project. Ok, then not something that would be available in a reasonable time frame unless we start it. Do you know if someone in another project is caring about that problem? > So I'm increasingly convinced that LXC should not try to fake out > any /proc & /sys file content, and instead document the limitations. > I'm also thinking that we should kill off our existing meminfo fake > fuse at some point. OK. > The more minor concern I have is around the implementation. AFAIR, the > /proc/cpuinfo file contents is not standardized across architectures, > so I'm concerned whether your parsing code is robust on non-x86 arches. Hum... I didn't even know that file would change with arch'es. -- Cedric -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list