Re: Distributed Process Scheduling Algorithm

Nitin Varyani <varyani.nitin1@xxxxxxxxx> · Tue, 16 Feb 2016 15:16:17 +0530

According to my project requirement, I need a distributed algorithm so mesos will not work. But work stealing is the best bargain. It will save communication costs. Thankyou. Can you please elaborate on the last part of your reply?

On Tue, Feb 16, 2016 at 2:12 PM, Dominik Dingel <dingel@xxxxxxxxxxxxxxxxxx> wrote:
On Tue, 16 Feb 2016 00:13:34 -0500

Valdis.Kletnieks@xxxxxx wrote:

> On Tue, 16 Feb 2016 10:18:26 +0530, Nitin Varyani said:

>

> > 1) Sending process context via network

>

> Note that this is a non-trivial issue by itself.  At a *minimum*,

> you'll need all the checkpoint-restart code.  Plus, if the process

> has any open TCP connections, *those* have to be migrated without

> causing a security problem.  Good luck on figuring out how to properly

> route packets in this case - consider 4 nodes 10.0.0.1 through 10.0.0.4,

> you migrate a process from 10.0.0.1 to 10.0.0.3,  How do you make sure

> *that process*'s packets go to 0.3 while all other packets still go to

> 0.1.  Also, consider the impact this may have on iptables, if there is

> a state=RELATED,CONNECTED on 0.1 - that info needs to be relayed to 0.3

> as well.

>

> For bonus points, what's the most efficient way to transfer a large

> process image (say 500M, or even a bloated Firefox at 3.5G), without

> causing timeouts while copying the image?

>

> I hope your research project is *really* well funded - you're going

> to need a *lot* of people (Hint - find out how many people work on

> VMWare - that should give you a rough idea)

I wouldn't see things that dark. Also this is an interesting puzzle.

To migrate processes I would pick an already existing solution.

Like there is for container. So every process should be, if possible, in a container.

To migrate them efficiently without having some distributed shared memory,

you might want to look at userfaultfd.

So now back to the scheduling, I do not think that every node should keep track

of every process on every other node, as this would mean a massive need for

communication and hurt scalability. So either you would implement something like work stealing or go for a central entity like mesos. Which could do process/job/container scheduling for you.

There are now two pitfalls which are hard enough on their own:

- interprocess communication between two process with something different than a socket

  in such an case you would probably need to merge the two distinct containers

- dedicated hardware

Dominik

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies