On Tue, 16 Feb 2016 00:13:34 -0500 Valdis.Kletnieks@xxxxxx wrote: > On Tue, 16 Feb 2016 10:18:26 +0530, Nitin Varyani said: > > > 1) Sending process context via network > > Note that this is a non-trivial issue by itself. At a *minimum*, > you'll need all the checkpoint-restart code. Plus, if the process > has any open TCP connections, *those* have to be migrated without > causing a security problem. Good luck on figuring out how to properly > route packets in this case - consider 4 nodes 10.0.0.1 through 10.0.0.4, > you migrate a process from 10.0.0.1 to 10.0.0.3, How do you make sure > *that process*'s packets go to 0.3 while all other packets still go to > 0.1. Also, consider the impact this may have on iptables, if there is > a state=RELATED,CONNECTED on 0.1 - that info needs to be relayed to 0.3 > as well. > > For bonus points, what's the most efficient way to transfer a large > process image (say 500M, or even a bloated Firefox at 3.5G), without > causing timeouts while copying the image? > > I hope your research project is *really* well funded - you're going > to need a *lot* of people (Hint - find out how many people work on > VMWare - that should give you a rough idea) I wouldn't see things that dark. Also this is an interesting puzzle. To migrate processes I would pick an already existing solution. Like there is for container. So every process should be, if possible, in a container. To migrate them efficiently without having some distributed shared memory, you might want to look at userfaultfd. So now back to the scheduling, I do not think that every node should keep track of every process on every other node, as this would mean a massive need for communication and hurt scalability. So either you would implement something like work stealing or go for a central entity like mesos. Which could do process/job/container scheduling for you. There are now two pitfalls which are hard enough on their own: - interprocess communication between two process with something different than a socket in such an case you would probably need to merge the two distinct containers - dedicated hardware Dominik _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies