You might want to look at Ceph (http://ceph.newdream.net/) which partitions data by hashing too. Specifically the paper on "CRUSH - Controlled, Scalable, Decentralized Placement of Replicated Data" http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf Some of those ideas added to glusterfs could be very cool. B. On Wed, Mar 19, 2008 at 3:17 AM, Daniel van Ham Colchete <daniel.colchete@xxxxxxxxx> wrote: > Hello yall! > > I've away from the community lately as I had to focus on some other > stuff here at my work, but I'm still really anxious for GlusterFS > 1.3.8 to be released so I can resume my tests and productions > environments again :). > > As I was just going to bed and thinking about millions of small files > as a solution to a problem I'm trying to solve here, I had this idea: > > The Partitioner Scheduler > =================== > One liner: it's a scheduler that chooses witch file goes to what > server based on an 1 bit hash of it's name (or path inside gluster > mount + name). > > What do you win? You know where to look for the file. > > Picture this: 10 8HD 3TB RAID6 servers unified (no AFR). Question: > where is x file? Unify would send a request to everyone asking: do you > have it? So you probably have 80 harddrive head's searching for the > directory index sector. That's really bad when you're dealing with > small files, it's like everything stopping for 50ms because of a file > lookup. Instead request it to the server where the file should be. If > it isn't there, ask everybody else. > > Implementation: get a 8 bits really fast well distributed hash, > basically your splitting your files to 256 possible computers. You > only have 2? One is 0-127, two is 128-255. > > Question: when a file isn't at the expected server, should we move it > there? I don't know, I can always imagine a completely crazy Unify+AFR > situation where someone could screw things up if he really puts his > mind into it. But, if not, upgrading the cluster would mean having the > old problem back at the beginning at least. > > Problem: well, I'm assuming the servers are pretty much alike. > Solution: get a bigger hash and add weights to the hash distribution. > > Problem 2: although the name explains how it works I think there was > another thing using the same name in the storage area, but can't > remember what ;-)... Two different things, same name, not good... > Solution: The Colchete's Scheduler? Just kidding... hahaha > ===================== > > The idea is not really original, if you look what Google's Bigtable > does to be scalable. PostgreSQL and Oracle also achieve a lot knowing > where to look for some information. You can still have Partitioned > Unify a lot of 3-AFR or 2-AFR to increased reliability. > > Well, if you have less than 6 servers you would really care about this > I think. If you have a small number of big file that wouldn't be much > useful too, but that's the easy case everywhere. > > Comments? > > Best regards, > Daniel Colchete > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >