On Fri, Oct 25, 2013 at 01:55:13PM +0000, Shawn O. Pearce wrote: > > As an extra optimization, when `prepare_bitmap_walk` succeeds, the > > `reuse_partial_packfile_from_bitmap` call can be attempted: it will find > > the amount of objects at the beginning of the on-disk packfile that can > > be reused as-is, and return an offset into the packfile. The source > > packfile can then be loaded and the bytes up to `offset` can be written > > directly to the result without having to consider the entires inside the > > packfile individually. > > Yay! This is similar to the optimization we use in JGit to send the > entire pack, but the part about sending a leading prefix is new. Do > you have any data showing how well this works in practice for cases > where offset is before than length-20? Actually, I don't think it kicks in very much due to packfile ordering. You have all of the commits at the front of the pack, then all of the trees, then all of the blobs. So if you want the whole thing, it is easy to reuse a big chunk. But if you want only the most recent slice, we can reuse the early bit with the new commits, but we stop partway through the commit list. You still have to handle all of the trees and blobs separately. So in practice, I think this really only kicks in for clones anyway. In theory, you could find "islands" of ones in the bitmap and send whole slices of packfile at once. But you need to be careful not to send a delta without its base. Which I think means you end up having to generate the whole sha1 list anyway, and check that the other side has each base before reusing a delta (i.e., the normal code path). In fact, I'm not quite sure that even a partial reuse up to an offset is 100% safe. In a newly packed git repo it is, because we always put bases before deltas (and OFS_DELTA objects need this). But if you had a bitmap generated from a fixed thin pack, we would have REF_DELTA objects early on that depend on bases appended to the end of the pack. So I really wonder if we should scrap this partial reuse and either just have full reuse, or go through the regular object_entry construction. Vicent, you've thought about the reuse code a lot more than I have. Any thoughts? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html