Hello yall! I've away from the community lately as I had to focus on some other stuff here at my work, but I'm still really anxious for GlusterFS 1.3.8 to be released so I can resume my tests and productions environments again :). As I was just going to bed and thinking about millions of small files as a solution to a problem I'm trying to solve here, I had this idea: The Partitioner Scheduler =================== One liner: it's a scheduler that chooses witch file goes to what server based on an 1 bit hash of it's name (or path inside gluster mount + name). What do you win? You know where to look for the file. Picture this: 10 8HD 3TB RAID6 servers unified (no AFR). Question: where is x file? Unify would send a request to everyone asking: do you have it? So you probably have 80 harddrive head's searching for the directory index sector. That's really bad when you're dealing with small files, it's like everything stopping for 50ms because of a file lookup. Instead request it to the server where the file should be. If it isn't there, ask everybody else. Implementation: get a 8 bits really fast well distributed hash, basically your splitting your files to 256 possible computers. You only have 2? One is 0-127, two is 128-255. Question: when a file isn't at the expected server, should we move it there? I don't know, I can always imagine a completely crazy Unify+AFR situation where someone could screw things up if he really puts his mind into it. But, if not, upgrading the cluster would mean having the old problem back at the beginning at least. Problem: well, I'm assuming the servers are pretty much alike. Solution: get a bigger hash and add weights to the hash distribution. Problem 2: although the name explains how it works I think there was another thing using the same name in the storage area, but can't remember what ;-)... Two different things, same name, not good... Solution: The Colchete's Scheduler? Just kidding... hahaha ===================== The idea is not really original, if you look what Google's Bigtable does to be scalable. PostgreSQL and Oracle also achieve a lot knowing where to look for some information. You can still have Partitioned Unify a lot of 3-AFR or 2-AFR to increased reliability. Well, if you have less than 6 servers you would really care about this I think. If you have a small number of big file that wouldn't be much useful too, but that's the easy case everywhere. Comments? Best regards, Daniel Colchete