On Fri, Feb 1, 2013 at 5:15 PM, Jeff King <peff@xxxxxxxx> wrote: > The short-sha1 is a clever idea. Looks like it saves us on the order of > 4MB for linux-2.6 (versus the full 20-byte sha1). Not as big as the > savings we get from dropping the other 3 sha1's to uint32_t, but still > not bad. We could save another 4 bytes per commit by using 3 bytes for storing .idx offsets. linux-2.6 only has 3M objects. It'll take many years for big projects to reach 16M objects and need the fourth byte in uint32_t. > I guess the next steps in iterating on this would be: > > 1. splitting out the refactoring here into separate patches > > 2. squashing the cleaned-up bits into my patch 4/6 > > 3. deciding whether this should go into a separate file or as part of > index v3. Your offsets depend on the .idx file having a sorted sha1 > list. That is not likely to change, but it would still be nice to > make sure they cannot get out of sync. I'm still curious what the > performance impact is for mmap-ing N versus N+8MB. 4. Print some cache statistics in "count-objects -v" >> The length of SHA-1 is chosen to be able to unambiguously identify any >> cached commits. Full SHA-1 check is done after to catch false >> positives. > > Just to be clear, these false positives come because the abbreviation is > unambiguous within the packfile, but we might be looking for a commit > that is not even in our pack, right? It may even be ambiguous within the pack, say an octopus (i.e. not cached) commit that shares the same sha-1 prefix with one of the cached commits. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html