Cedric Le Goater wrote: > Held at De Vere Universty Arms Hotel, Cambridge, UK > > * Monday, Sept 3, 9h00 to 16h00 : > > Kir Kolyshkin <kir@xxxxxxxxxx> > Pavel Emelianov <xemul@xxxxxxxxxx> > Masahiko Takahashi <masahiko@xxxxxxxxxxxxxxxxxxxx> > Oren Laadan <orenl@xxxxxxxxxxxxxxx> > James Youngman <youngman@xxxxxxxxxx> > ??? (NTT) > Cedric Le Goater <clg@xxxxxxxxxx> > > On the phone (skype with very high noise level) > > Paul Menage <menage@xxxxxxxxxx> > Srivatsa Vaddagiri <vatsa@xxxxxxxxxx> > Dhaval Giani <dhaval@xxxxxxxxxxxxxxxxxx> > Vaidyanathan Srinivasan <svaidy@xxxxxxxxxx> > > * Tuesday, Sept 4, 15h00 to 18h00 : > > Pavel Emelianov <xemul@xxxxxxxxxx> > Paul Menage <menage@xxxxxxxxxx> > Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > Cedric Le Goater <clg@xxxxxxxxxx> > > = Namespace status > ====================================================================== > > * sysv ipc > > extend to posix mqueue. > . check that /dev/mqueue can be mounted multiple times > . mqueue sysctls will need a fix : > fs.mqueue.queues_max > fs.mqueue.msg_max > fs.mqueue.msgsize_max > > > * uname namespace > > considered complete. > > what about being able to set the kernel version ? > > * user > > useful today to current container technologies (openvz, vserver) > > uid checks should be replaced by (uid, userns) to complete > integration with filesystems > security needs to be looked at > so is signal delivery > > * pid namespace > > in dev > > signal handling completion underway > pid_t cleanups > . the purpose is to remove any explicit reference to > task->pid > . keep ->pid in task struct only for performance > . complex cleanups ones: > af_unix credentials > file locks > timer stat > > kthread cleanup > . replace kernel_thread() by the kthread API > . change core kthread API to support signals > . then nfs needs extra love. is someone working on it ? > > do we need hierarchical levels ? > > > * net > > in dev > > veth is in dmiller's tree > sysfs cleanups underway in greg's tree > eric is working on a mininal patchset acceptable for netdev. will > ask dmiller advice on the topic > > ip isolation could be done with netfilter or security hooks > > * device namespace > > to do > > we don't want to get rid of mknod() but we also want to limit the > view of the devices in a container. one way to do this is through a > device namespace which would only expose a 'white list' of devices > when unshared. a possible 'white list' is : > > /dev/null > /dev/full > /dev/zero > /dev/rtc > /dev/random > /dev/pts/* > > do we require a extra namespace for /dev/pts/* to handle its > virtualization or can this be done directly in the device namespace ? > > check that /dev/pts can be mounted multiple times. > > * time > > to do > > required for C/R > will only make sense in a "closed" environment > the purpose is to keep the monotonic timers from expiring when > you restart > > * other possible namespace ? > > rtc ? which is an isolation issue and also a sysctl issue > > comment from eric : > a redesign of lsm, a la netfilter, could cover all > isolation needs. > > * namespace management > > > . entering > > no consensus on how this should be done. > > probably because the need is related to a container and not just > namespaces. it should be solved with a container object and > probably a subsystem. > > serge's proposal of sys_hijack() is interesting but will require > more study because, in UNIX, it's not natural for a child process > to have 2 parents ! > > . extending clone to support more flags > > new syscall proposal for a clone2(struct clone2_arg_struct* args) > > * tests > > . ltp for unit > . keep the integration tests in each container framework. > > * Filesystems > > . unprivilege mounts (not addressed) > > merged > > . multiple /sys mounts (in dev) > > missing some bits (eric working on it) to decouple sysfs and > kobjects > > . multiple /proc mounts (to complete) > > multiple mount done > to limit access to /proc files, use the user namespace checks ? > for the contents of each file, use the current context to identify > namespace > > * Console > > . a running getty should be covered by tty namespace > . printk will require some support to be isolated. > > = Task Container (from container dev plan) > ====================================================================== > > * base features > > hierarchical/virtualized containers > support vserver mgmnt of sub-containers > locking cleanup > control file API simplification > unified container including namespaces > > the "container"/"task container" name is ambiguous and it should change to > "control group" > > > * userpace RBCE to provide controls for > > users > groups > pgrp > executable > > * specific containers targeted: > > split cpusets into > cpuset > memset > network > connect/bind/accept controller using iptables > > controllers : > > memory controller (see detail below) > The status is that the patches for memory control are in -mm. We need to stabilize the patches, look at reducing the overhead of double LRU for root containers. We need to explore the double LRU approach as a longer term solution. > cpu controller > > Status: > - Extensions required to CFS core for supporting > group-scheduling aspects are mostly there (in > mainline) > > Todo: > - Better SMP group-fairness > - Hard-limit cpu usage > - SCHED_FIFO like policy for groups > - Group priorities (?) > > io controller (see detail below) > > network flow id control > > per-container OOM handler (userspace) > > per-container swap > > per-container disk I/O scheduling > > per container memory reclaim > > per container dirty page (write throttling) limit. > > network rate limiting (outbound) based on container > > * misc > > User level APIS to identify the resource limits that is allowed to a > job, for example, how much physical memory a process can use. This > should seamlessly integrated with non-container environment as well > (may be with ulimit). > > Per container stats, like pages on active list, cpus usage, etc > > = Resource Management (from container dev plan) > ====================================================================== > > * memory controller > > users and requirements: > > 1. The containers solution would need resource management > (including memory control and per container swap files). Paul > Menage, YAMOMOTO Takshi, Peter Zijlstra, Pavel Emelianov have > all shown interest in the memory controller patches. > > 2. The memory controller can account for page cache as well, all > people interested in limiting page cahce control, can > theoratically put move all page cache hungry applications under > the same container. > > Planned enhancements to the memory controller > 1. Improved shared page accounting > 2. Improved statistics > 3. Soft-limit memory usage > > generic infrastructure work: > 1. Enhancing containerstats > a. Working on per controller statistics > b. Integrating taskstats with containerstats > 2. CPU accounting framework > a. Migrate the accounting to be more precis > > * cpu controller > > users and requirements: > > 1. Virtualization solutions like containers and KVM need CPU > control. KVM for example would like to have both limits and > guarantees supported by a CPU controller, to control CPU > allocation to a particular instance. > 2. Workload management products would like to exploit this for > providing guaranteed cpu bandwidth and also (hard/soft) > limiting cpu usage. > > work items > 1. Fine-grained proportional-share fair-group scheduling. > 2. More accurate SMP fairness > 3. Hard limit > 4. SCHED_FIFO type policy for groups > 5. Improved statistics and debug facility for group scheduler > > * io controller > > users and requirements: > > 1. At a talk presented to the Linux Foundation (OSDL), the > attendees showed interest in an IO controller to control IO > bandwidth of various filesystem operations (backup, > journalling, etc) > > work items: > 1. Proof of concept IO controller and community discussion/feedback > 2. Development and Integration of the IO controller with containers > > open issues > 1. Automatic tagging/resource classification engine > > = Checkpoint/Restart > ====================================================================== > > * need to unified the freezer to reach a quiescence point > > * overall strategy : > . checkpoint: in kernel > . restart : first recreate process tree then let each > process restart itself > > * possible direction for C/R user api > . checkpoint/restart syscalls > . C/R file systems > solves the set id issue > elegant but exposes too much the ABI > > example : > > . > |-- 0x00003002 > | |-- 0x00003002 > | | |-- attr > | | |-- signal > | | |-- signal.altstack > | | |-- signal.pending > | | |-- thread > | | |-- thread.frame > | | |-- timers > | | |-- tls > | | `-- wait.zombies > | |-- aio > | |-- attr > | |-- fds > | |-- ldt > | |-- mem.segments > | |-- numa > | |-- process > | |-- signal.action > | |-- signal.pending > | |-- sysv.semadj > | |-- sysv.shmcount > | `-- thread.list > |-- af_inet_listening > |-- af_inet_orphan_count > |-- af_inet_orphan_data > |-- af_inet_orphan_info > |-- files > | |-- 0 > | |-- 1 > | |-- 10137663680 > | |-- 1014250cdc0 > | |-- 2 > | `-- stdios > |-- sysv.msq > |-- sysv.sem > `-- sysv.shm > > * memory C/R > > critical for performance > per-container swapfile ? > > * subsystem C/R API. > > keep it on the side for the moment <subsys>_cr.c to identify the > needs of each subsystem before asking the maintainer's comments > > possible cr_ops in some objects (like for network protocols) but > also ops 'a la' virt_ops to prepare for different C/R strategy : > brutal, incremental, live migration > > * setting id back to what they where > > possible global syscall to set ids of pid,ipc,pts. > else use the C/R fs > > * statefile format > > no big issues. let's pick one. > > * optimization > > parallel C/R > > _______________________________________________ > Containers mailing list > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linux-foundation.org/mailman/listinfo/containers -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers