Re: [PATCH 00/80] Kernel based checkpoint/restart [v18]

Oren Laadan <orenl@xxxxxxxxxxx> · Thu, 01 Oct 2009 15:02:42 -0400

Serge E. Hallyn wrote:
> Quoting Daniel Lezcano (daniel.lezcano@xxxxxxx):
>> Serge E. Hallyn wrote:
>>> Quoting Daniel Lezcano (daniel.lezcano@xxxxxxx):
>>>   
>>>> Dan Smith wrote:
>>>>     
>>>>> The header file makes it pretty clear what is going on,       
>>>> Certainly for you.
>>>>     
>>> If you're worried about hooking lxc-restart up and that
>>> being a mess, 
>> Yep, I am worried about that too :)
>>
>>> i have said that as soon as something hits -mm,
>>> I will hook up lxc-restart.  I do agree, the userspace code
>>> would be much simpler if we didn't need to do all of the
>>> process tree creation in userspace :)
>> Yes and I know there were discussions about this point several times for  
>> the proctree, I won't argue with kernel vs user proctree creation.
>> But what I understood is you will continue to parse the statefile to  
>> recreate some other resources like a subset of the network and here I am  
>> lost.
> 
> If network devices end up being recreated in userspace - either the
> ones for the root restarted container, or all the devices including
> for any child network namespaces - then I believe they will be
> considered container objects.  All the container information is at the
> top of the checkpoint file, so the program coordinating the restart
> will see all of the information before the task hierarchy.  Actually, I
> thought linux-2.6/Documentation/checkpoint/readme.txt used to
> explicitly show a 'container information' section between the
> image header and task hierarchy.  Oren?

It used to. I'll put it back there, now that somebody cares :)

But yes, the idea is that the part in which userspace is involved
is at the beginning of the image file, to avoid kernel-userspace
bouncing.

> 
>> Who in the linux community will understand what is checkpointed and what  
>> is restored from the kernel or from the userspace ?

Interested developers, and they can look at the code and read the
documentation. Assuming suitable tools are available for the users,
who else would care ?

> 
> It should all be documented linux-2.6/Documentation/checkpoint/.  But
> right now it's not even settled whether process creation in userspace
> is going to be the final acceptable way, so documenting speculation
> about how we're going to do network devices just seems too certain to
> not end up matching reality.

Here is my view on the subject (I'll add it to the documentation).

* By design, we do everything in the kernel, unless there is a
strong reasoning to move something to userspace.

* Keep a clear distinction for what we move to userspace. Avoid
doing something partly in kernel and partly in userspace.

* Streamline logic, so the execution flow doesn't bounce in and
out of the kernel. What's done in userspace appears _first_ in
the image file.

* My rules of thumb to take something to userspace are:

1. If it can be done _easily_ and _uniformly_ in userspace, _and_

2. If doing it in userspace gives us _substantial_ flexibility,
 or coverage, or  portability that is difficult in the kernel.

Let me demonstrate this thinking with 3 examples: open files (in
kernel), tasks tree (userspace), and network namespaces (userspace).

Why not restore open files in userspace ?  because some file types
are hard to do in userspace, and we want uniform handling. Also, it
becomes very tricky to do in userspace in the presence of mount
points, chroots, and mount namespaces.

Why restore the tasks tree in userspace ?  because it's easy and
portable (using the new clone); because it is unrelated to where/how
we do other resource; because it gives the flexibility for the
restarting tasks to do useful things after they are created but
before they call sys_restart (flexibility).

Why restore network namespace (their setup) in userspace ? because
it's easy and portable, and network setup tools are well developed
and understood; because we will want some policy to it (e.g. restart
doesn't care which actual device is used, we don't want such decision
to be in the kernel); because we don't want to replicate the rich
high level tools from userspace inside the kernel; because you may
want to change the configuration compare to checkpoint (e.g. add
firewall).

(That said, this is not written in stone, and if you have strong
arguments to convince otherwise, this is a good time).

> 
>> Does this imply someone has to use a specific tool like "restart.c"  
>> within its own tools, assuming this tool is installed in the system or  
>> shall he copy-paste the code of the GPL licensed restart.c to its LGPL  
>> licensed tools ?
> 
> Hmm, I think a tiny little lgpl library, maybe even shipping under the
> kernel tree, implementing a generic, whole-container and sub-tree
> checkpoint and restart, makes very good sense.
> 
> It certainly does NOT make sense to require multiple projects to track
> all changes to the checkpoint image format as the kernel changes...
> 

The idea is to add a plugin architecture to restart to allow users
to execute any useful work before and after the tasks tree is created,
but prior to calling sys_restart().

And yes, eventually making this a library, too.

Oren.

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/containers