On Thu, 2009-02-12 at 14:10 -0800, Andrew Morton wrote: > On Thu, 12 Feb 2009 13:51:23 -0800 > Dave Hansen <dave@xxxxxxxxxxxxxxxxxx> wrote: > > > On Thu, 2009-02-12 at 11:42 -0800, Andrew Morton wrote: > > > On Thu, 12 Feb 2009 13:30:35 -0600 > > > Matt Mackall <mpm@xxxxxxxxxxx> wrote: > > > > > > > On Thu, 2009-02-12 at 10:11 -0800, Dave Hansen wrote: > > > > > > > > > > - In bullet-point form, what features are missing, and should be added? > > > > > > > > > > * support for more architectures than i386 > > > > > * file descriptors: > > > > > * sockets (network, AF_UNIX, etc...) > > > > > * devices files > > > > > * shmfs, hugetlbfs > > > > > * epoll > > > > > * unlinked files > > > > > > > > > * Filesystem state > > > > > * contents of files > > > > > * mount tree for individual processes > > > > > * flock > > > > > * threads and sessions > > > > > * CPU and NUMA affinity > > > > > * sys_remap_file_pages() > > > > > > > > I think the real questions is: where are the dragons hiding? Some of > > > > these are known to be hard. And some of them are critical checkpointing > > > > typical applications. If you have plans or theories for implementing all > > > > of the above, then great. But this list doesn't really give any sense of > > > > whether we should be scared of what lurks behind those doors. > > > > > > How close has OpenVZ come to implementing all of this? I think the > > > implementatation is fairly complete? > > > > I also believe it is "fairly complete". At least able to be used > > practically. > > > > > If so, perhaps that can be used as a guide. Will the planned feature > > > have a similar design? If not, how will it differ? To what extent can > > > we use that implementation as a tool for understanding what this new > > > implementation will look like? > > > > Yes, we can certainly use it as a guide. However, there are some > > barriers to being able to do that: > > > > dave@nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... | diffstat | tail -1 > > 628 files changed, 59597 insertions(+), 2927 deletions(-) > > dave@nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... | wc > > 84887 290855 2308745 > > > > Unfortunately, the git tree doesn't have that great of a history. It > > appears that the forward-ports are just applications of huge single > > patches which then get committed into git. This tree has also > > historically contained a bunch of stuff not directly related to > > checkpoint/restart like resource management. > > > > We'd be idiots not to take a hard look at what has been done in OpenVZ. > > But, for the time being, we have absolutely no shortage of things that > > we know are important and know have to be done. Our largest problem is > > not finding things to do, but is our large out-of-tree patch that is > > growing by the day. :( > > > > Well we have a chicken-and-eggish thing. The patchset will keep > growing until we understand how much of this: > > > dave@nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... | diffstat | tail -1 > > 628 files changed, 59597 insertions(+), 2927 deletions(-) > > we will be committed to if we were to merge the current patchset. Here's the measurement that Alexey suggested: dave@nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... kernel/cpt/ | diffstat Makefile | 53 + cpt_conntrack.c | 365 ++++++++++++ cpt_context.c | 257 ++++++++ cpt_context.h | 215 +++++++ cpt_dump.c | 1250 ++++++++++++++++++++++++++++++++++++++++++ cpt_dump.h | 16 cpt_epoll.c | 113 +++ cpt_exports.c | 13 cpt_files.c | 1626 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ cpt_files.h | 71 ++ cpt_fsmagic.h | 16 cpt_inotify.c | 144 ++++ cpt_kernel.c | 177 ++++++ cpt_kernel.h | 99 +++ cpt_mm.c | 923 +++++++++++++++++++++++++++++++ cpt_mm.h | 35 + cpt_net.c | 614 ++++++++++++++++++++ cpt_net.h | 7 cpt_obj.c | 162 +++++ cpt_obj.h | 62 ++ cpt_proc.c | 595 ++++++++++++++++++++ cpt_process.c | 1369 ++++++++++++++++++++++++++++++++++++++++++++++ cpt_process.h | 13 cpt_socket.c | 790 ++++++++++++++++++++++++++ cpt_socket.h | 33 + cpt_socket_in.c | 450 +++++++++++++++ cpt_syscalls.h | 101 +++ cpt_sysvipc.c | 403 +++++++++++++ cpt_tty.c | 215 +++++++ cpt_ubc.c | 132 ++++ cpt_ubc.h | 23 cpt_x8664.S | 67 ++ rst_conntrack.c | 283 +++++++++ rst_context.c | 323 ++++++++++ rst_epoll.c | 169 +++++ rst_files.c | 1648 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ rst_inotify.c | 196 ++++++ rst_mm.c | 1151 +++++++++++++++++++++++++++++++++++++++ rst_net.c | 741 +++++++++++++++++++++++++ rst_proc.c | 580 +++++++++++++++++++ rst_process.c | 1640 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ rst_socket.c | 918 +++++++++++++++++++++++++++++++ rst_socket_in.c | 489 ++++++++++++++++ rst_sysvipc.c | 633 +++++++++++++++++++++ rst_tty.c | 384 +++++++++++++ rst_ubc.c | 131 ++++ rst_undump.c | 1007 ++++++++++++++++++++++++++++++++++ 47 files changed, 20702 insertions(+) One important thing that leaves out is the interaction that this code has with the rest of the kernel. That's critically important when considering long-term maintenance, and I'd be curious how the OpenVZ folks view it. > Now, we've gone in blind before - most notably on the > containers/cgroups/namespaces stuff. That hail mary pass worked out > acceptably, I think. Maybe we got lucky. I thought that > net-namespaces in particular would never get there, but it did. > > That was a very large and quite long-term-important user-visible > feature. > > checkpoint/restart/migration is also a long-term-...-feature. But if > at all possible I do think that we should go into it with our eyes a > little less shut. One thing Ingo has asked for that I understand a bit more clearly is a programmatic statement of what is and is not covered by this current code. That's certainly one eye-opening activity which I'll get to immediately. -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html