Junio C Hamano <junkio@xxxxxxx> wrote: > Nicolas Pitre <nico@xxxxxxx> writes: > > The idea is to deal with only tree objects containing the 64K most > > frequently used base names and fall back to the current tree object > > encoding for objects that couldn't be represented that way. > > Ah, I was wondering the same thing as Linus after seeing shawn > talked about the 2-byte prefix on #git. Falling back to an > alternate encoding for rarer cases makes sense. Right. Git is already fast, and already compresses the object data very well. But I think we can make things faster without violating the basic assumptions of "whole project history", and it just turns out that those encodings are also making the data smaller for the common case of human maintained source code. Which of course is one of the primary uses for Git, but is obviously not the only use. In the worst case scenario we'll be doing exactly what we are doing today with regards to encoding. That performance and disk space usage is already known and considered "very, very fast" and "very small". ;-) In the best case scenario (human managed source like linux.git, git.git) we'll scream with pack v4. The rev-list stats I posted from just the tree encoding switch not only saved 3 MiB of disk space but improved total running time by 12.5%. Nico and I know we can still do better. With 15k basenames in linux.git we're filling only 23.6% of the available namespace within a single packfile. I think that by the time we have enough basenames to break 64K we'll be several years out and be talking about historical packs vs. active packs. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html