Avery Pennarun <apenwarr@xxxxxxxxx> wrote: > > While we're here, it's probably worth mentioning that git's index file > format (which stores a sequential list of full paths in alphabetical > order, instead of an actual hierarchy) does become a bottleneck when > you actually have a huge number of files in your repo (like literally > a million). You can't actually binary search through the index! The > current implementation of submodules allows you to dodge that > scalability problem since you end up with multiple smaller index > files. Anyway, that's fixable too. Yes. More than once I've been tempted to rewrite the on-disk (and I guess in-memory) format of the index. And then I remember how painful that stuff is in either C git.git or JGit, and I back away slowly. :-) Ideally the index is organized the same way the trees are, but you still can't do a really good binary search because of the ass-backwards name sorting rule for trees. But for performance reasons you still want to keep the entire index in a single file, an index per directory (aka SVN/CVS) is too slow for the common case of <30k files. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html