New unify scheduler design proposition

"Daniel van Ham Colchete" <daniel.colchete@xxxxxxxxx> · Wed, 19 Mar 2008 00:17:56 -0300

Hello yall!

I've away from the community lately as I had to focus on some other
stuff here at my work, but I'm still really anxious for GlusterFS
1.3.8 to be released so I can resume my tests and productions
environments again :).

As I was just going to bed and thinking about millions of small files
as a solution to a problem I'm trying to solve here, I had this idea:

The Partitioner Scheduler
===================
One liner: it's a scheduler that chooses witch file goes to what
server based on an 1 bit hash of it's name (or path inside gluster
mount + name).

What do you win? You know where to look for the file.

Picture this: 10 8HD 3TB RAID6 servers unified (no AFR). Question:
where is x file? Unify would send a request to everyone asking: do you
have it? So you probably have 80 harddrive head's searching for the
directory index sector. That's really bad when you're dealing with
small files, it's like everything stopping for 50ms because of a file
lookup. Instead request it to the server where the file should be. If
it isn't there, ask everybody else.

Implementation: get a 8 bits really fast well distributed hash,
basically your splitting your files to 256 possible computers. You
only have 2? One is 0-127, two is 128-255.

Question: when a file isn't at the expected server, should we move it
there? I don't know, I can always imagine a completely crazy Unify+AFR
situation where someone could screw things up if he really puts his
mind into it. But, if not, upgrading the cluster would mean having the
old problem back at the beginning at least.

Problem: well, I'm assuming the servers are pretty much alike.
Solution: get a bigger hash and add weights to the hash distribution.

Problem 2: although the name explains how it works I think there was
another thing using the same name in the storage area, but can't
remember what ;-)... Two different things, same name, not good...
Solution: The Colchete's Scheduler? Just kidding... hahaha
=====================

The idea is not really original, if you look what Google's Bigtable
does to be scalable. PostgreSQL and Oracle also achieve a lot knowing
where to look for some information. You can still have Partitioned
Unify a lot of 3-AFR or 2-AFR to increased reliability.

Well, if you have less than 6 servers you would really care about this
I think. If you have a small number of big file that wouldn't be much
useful too, but that's the easy case everywhere.

Comments?

Best regards,
Daniel Colchete