"Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: > Avery Pennarun <apenwarr@xxxxxxxxx> wrote: > > > > While we're here, it's probably worth mentioning that git's index file > > format (which stores a sequential list of full paths in alphabetical > > order, instead of an actual hierarchy) does become a bottleneck when > > you actually have a huge number of files in your repo (like literally > > a million). You can't actually binary search through the index! The > > current implementation of submodules allows you to dodge that > > scalability problem since you end up with multiple smaller index > > files. Anyway, that's fixable too. > > Yes. > > More than once I've been tempted to rewrite the on-disk (and I guess > in-memory) format of the index. And then I remember how painful that > stuff is in either C git.git or JGit, and I back away slowly. :-) > > Ideally the index is organized the same way the trees are, but > you still can't do a really good binary search because of the > ass-backwards name sorting rule for trees. But for performance > reasons you still want to keep the entire index in a single file, > an index per directory (aka SVN/CVS) is too slow for the common > case of <30k files. I guess that modern filesystems solve the problem of very many files in a single directory somehow (hash tables?). Perhaps the index file could borrow some such mechanism as an extension. Index for index? -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html