On Sun, Mar 17, 2013 at 08:21:13PM +0700, Nguyen Thai Ngoc Duy wrote: > On Thu, Jan 31, 2013 at 6:06 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > On Wed, Jan 30, 2013 at 09:16:29PM +0700, Duy Nguyen wrote: > >> Perhaps we could store abbrev sha-1 instead of full sha-1. Nice > >> space/time trade-off. > > > > Following the on-disk format experiment yesterday, I changed the > > format to: > > > > - a list a _short_ SHA-1 of cached commits > > .. > > > > The length of SHA-1 is chosen to be able to unambiguously identify any > > cached commits. Full SHA-1 check is done after to catch false > > positives. For linux-2.6, SHA-1 length is 6 bytes, git and many > > moderate-sized projects are 4 bytes. > > And if we are going to create index v3, the same trick could be used > for the sha-1 table in the index. We use the short sha-1 table for > binary search and put the rest of sha-1 in a following table (just > like file offset table). The advantage is a denser search space, about > 1/4-1/3 the size of full sha-1 table. You can make it even smaller at some (potential) run-time cost. Keep in mind you are just repeating information that is in the full sha1 list in the index. So you could store a fixed-size offset into that list (e.g., 32-bit), and then instead of comparing sha1s during a binary search, you would dereference the offset to the real sha1s and compare those. The run-time cost is not any worse in a big-O sense, but your cache locality is much worse (you hit a second random page for each sha1 comparison), which might be noticeable. You'd have to benchmark to see how big an impact. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html