Re: New unify scheduler design proposition

"Robert Newson" <robert.newson@xxxxxxxxx> · Wed, 19 Mar 2008 13:31:37 +0000

You might want to look at Ceph (http://ceph.newdream.net/) which
partitions data by hashing too.

Specifically the paper on "CRUSH - Controlled, Scalable, Decentralized
Placement of Replicated Data"

http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf

Some of those ideas added to glusterfs could be very cool.

B.

On Wed, Mar 19, 2008 at 3:17 AM, Daniel van Ham Colchete
<daniel.colchete@xxxxxxxxx> wrote:
> Hello yall!
>
>  I've away from the community lately as I had to focus on some other
>  stuff here at my work, but I'm still really anxious for GlusterFS
>  1.3.8 to be released so I can resume my tests and productions
>  environments again :).
>
>  As I was just going to bed and thinking about millions of small files
>  as a solution to a problem I'm trying to solve here, I had this idea:
>
>  The Partitioner Scheduler
>  ===================
>  One liner: it's a scheduler that chooses witch file goes to what
>  server based on an 1 bit hash of it's name (or path inside gluster
>  mount + name).
>
>  What do you win? You know where to look for the file.
>
>  Picture this: 10 8HD 3TB RAID6 servers unified (no AFR). Question:
>  where is x file? Unify would send a request to everyone asking: do you
>  have it? So you probably have 80 harddrive head's searching for the
>  directory index sector. That's really bad when you're dealing with
>  small files, it's like everything stopping for 50ms because of a file
>  lookup. Instead request it to the server where the file should be. If
>  it isn't there, ask everybody else.
>
>  Implementation: get a 8 bits really fast well distributed hash,
>  basically your splitting your files to 256 possible computers. You
>  only have 2? One is 0-127, two is 128-255.
>
>  Question: when a file isn't at the expected server, should we move it
>  there? I don't know, I can always imagine a completely crazy Unify+AFR
>  situation where someone could screw things up if he really puts his
>  mind into it. But, if not, upgrading the cluster would mean having the
>  old problem back at the beginning at least.
>
>  Problem: well, I'm assuming the servers are pretty much alike.
>  Solution: get a bigger hash and add weights to the hash distribution.
>
>  Problem 2: although the name explains how it works I think there was
>  another thing using the same name in the storage area, but can't
>  remember what ;-)... Two different things, same name, not good...
>  Solution: The Colchete's Scheduler? Just kidding... hahaha
>  =====================
>
>  The idea is not really original, if you look what Google's Bigtable
>  does to be scalable. PostgreSQL and Oracle also achieve a lot knowing
>  where to look for some information. You can still have Partitioned
>  Unify a lot of 3-AFR or 2-AFR to increased reliability.
>
>  Well, if you have less than 6 servers you would really care about this
>  I think. If you have a small number of big file that wouldn't be much
>  useful too, but that's the easy case everywhere.
>
>  Comments?
>
>  Best regards,
>  Daniel Colchete
>
>
>  _______________________________________________
>  Gluster-devel mailing list
>  Gluster-devel@xxxxxxxxxx
>  http://lists.nongnu.org/mailman/listinfo/gluster-devel
>