On Fri, 4 Dec 2009 11:19:42 -0500 Jeff Garzik <jgarzik@xxxxxxxxxx> wrote: > On Thu, Dec 03, 2009 at 11:24:00PM -0700, Pete Zaitcev wrote: I'm adding hail-devel to cc, because I'm going to explain where the scalability screw-up comes from. Greg, feel free to drop off by a reply. > > I'm going to showcase a rather limited version (<= 10 nodes and > > up to 1 million keys) by January. This is on par with what Eucaliptus > > has _and_ has the data redundancy. So, someone could use it as > > a replacement for a WebDAV server, I guess. So far, the best > > market seems to be people who want to test their S3 applications > > without setting up actual S3 accounts. That's about all it could do. > Well, I think that is a degraded vision of what will be available. > > tabled can already do high availability w/ failover of the front-end > and database (ie metadata). With your data replication patches, > that gives object data high availability, too. > > Nobody outside of Amazon themselves can claim that... :) The main issue is, we don't have a reverse index: there's no way to know, given a node ID, what keys are affected by the node going down. Therefore, in order to determine what keys have to be re-replicated, we have to scan the whole database of keys. Which is still not too bad if it can fit into RAM, but once it grows bigger, it's a problem. So, to an extent you can trade keys for nodes (and the total size, since we consider, say, 1TB commodity disks per node). You can go with fatter nodes, too, but that sort of defeats the purpose of cloud. The naive solution is, let's add secondary index. I foresee a problem with it: the index is going to be bigger than the database itself. NIDs are very small, 4 bytes each, and each key has up to 3 of them. So, a secondary index will push us out of RAM earlier, and I have no idea what effects updates to it are going to have. It's something to try once someone has a big enough deployment (say, 50 chunk nodes and 200,000 keys+, or other ratio) My plan to tackle this was to split the OID<->NID database away from the KEY-->OID database, and use a compact RAM-based database for OID<->NID. Now you see why OIDs are small and why tabled does not use keys as keys in Chunk. -- Pete P.S. Actually, I may be able to compress keys in RAM with radix encoding, if applications use filesystem-like key structure. If they use something like SHA256 for keys, it won't work. -- To unsubscribe from this list: send the line "unsubscribe hail-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html