On Thu, Sep 27, 2012 at 10:45:32AM -0700, Shawn O. Pearce wrote: > On 2012-08-12 Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > > Long term we might gain slight lookup speedup if we know object type > > as search region is made smaller. But for that to happen, we need to > > propagate object type hint down to find_pack_entry_one() and friends. > > Possible thing to do, I think. > > I'm not sure reclustering the index by object type is going to make a > worthwhile difference. Of 2.2m objects in the Linux tree, 320k are > commits. The difference between doing the binary search through all > objects vs. just commits is only 2 iterations more of binary search if > we assume the per-type ranges have their own fan-out tables. To me the big win would be implicit indexing for items that are present for every instance of a particular object type. So if we wanted to keep the timestamp for every commit, you could have a "pack-*.timestamps" that is literally just a packed list of uint32's, one per commit, where the position of a commit's timestamp in the list is the same as its position in the index of sha1s in the pack index. That's simple to do if your index is just commits. But if it includes all objects, then your list is sparse. So either you waste space by making an empty slot for the non-commit objects, or you have an extra level of indirection mapping the commit into the packed list, which is going to double the storage in this case (though you could reuse that extra mapping for the parent, generation number, etc, so it at least gets amortized as you store more data). Or is there some clever solution I'm missing? For your extension, I don't think it matters. You're sparse even in the commit-object space, so you have to store the mapping anyway. And your data is big enough that the overhead isn't too painful. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html