This might be of interest to you: https://www.criu.org
On Thu, Oct 4, 2018 at 8:23 PM <valdis.kletnieks@xxxxxx> wrote:
On Thu, 04 Oct 2018 21:44:14 +0300, Boian Karatotev said:
> I am a Computer Science student and for my last year I need to make and
> present a 'diploma project' at the end of June. So far I want to make a
> kernel module, whose description is in the following paragraph. I feel
> comfortable with C and my OS knowledge is maybe slightly better than my OS
> course. My question is: Would it possible to pull this off? I have no
> experience with the kernel and I want to get into kernel development, so
> this would be a perfect opportunity for that. My only issue is that this
> may be too complex for my experience.
> My idea: Something along the lines of checkpoint-restart as a kernel
> module. I want to ultimately be able to migrate a running process to a
> different machine (assuming same at least some basic similarity). I know of
> BLCR <http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/>
> and I am planning on using it as a guide, although I am unsure about
> working on it directly. As far as I know, the grading process does not
> require this to be 100% complete, so I am aiming at transferring at least
> all the memory, restoring file descriptors and maybe child
> processes/threads.
You mean you want to re-invent the current checkpoint-restart code that's been
in the kernel since v3.10 back in June 2013? (see kernel/kcmp.c for the gory
details).
Note that migrating a running process to a different machine is a *lot*
trickier, especially if it has things like open files or network connections.
"Assume at least some basic similarity" isn't anywhere *near* good enough - if
the process has /home/fred/wombats/my_terabyte_database open, you're going to
need to have it at the same place in the filesystem and data synced across to
the new target (particularly fun if the process scribbles some more on the file
while you're busy migrating it, or if it hasn't done an fsync). Similarly, if
it has a TCP connection open to someplace else, you're going to have to figure
out what to do with the IP 4-tuple and sequence numbers to avoid breaking the
connection. And if it's HPC software using MPI configured to do RDMA over
Infiniband, that's even uglier....
In fact, migrating an entire virtual machine is easier than migrating one
process, because you don't have to worry about recovering the process state,
that's all in kernel memory that you migrate with the VM. Move the VM, take
down the IP on the old hypervisor, set up the IP on the new one, toss out a
gratuitous ARP packet so other machines on the subnet notice, and you're ready
to go...
There's a *reason* why VMWare gets away with charging lots of money for their
enterprise-class software that supports migrating a live VM across hypervisors.
It's a lot harder to do than you think.
_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies