Re: [PATCH 10/19] pack-bitmap: add support for bitmap indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 25, 2013 at 01:55:13PM +0000, Shawn O. Pearce wrote:

> > As an extra optimization, when `prepare_bitmap_walk` succeeds, the
> > `reuse_partial_packfile_from_bitmap` call can be attempted: it will find
> > the amount of objects at the beginning of the on-disk packfile that can
> > be reused as-is, and return an offset into the packfile. The source
> > packfile can then be loaded and the bytes up to `offset` can be written
> > directly to the result without having to consider the entires inside the
> > packfile individually.
> 
> Yay! This is similar to the optimization we use in JGit to send the
> entire pack, but the part about sending a leading prefix is new. Do
> you have any data showing how well this works in practice for cases
> where offset is before than length-20?

Actually, I don't think it kicks in very much due to packfile ordering.
You have all of the commits at the front of the pack, then all of the
trees, then all of the blobs. So if you want the whole thing, it is easy
to reuse a big chunk. But if you want only the most recent slice, we can
reuse the early bit with the new commits, but we stop partway through
the commit list. You still have to handle all of the trees and blobs
separately.

So in practice, I think this really only kicks in for clones anyway.

In theory, you could find "islands" of ones in the bitmap and send whole
slices of packfile at once. But you need to be careful not to send a
delta without its base. Which I think means you end up having to
generate the whole sha1 list anyway, and check that the other side has
each base before reusing a delta (i.e., the normal code path).

In fact, I'm not quite sure that even a partial reuse up to an offset is
100% safe. In a newly packed git repo it is, because we always put bases
before deltas (and OFS_DELTA objects need this). But if you had a bitmap
generated from a fixed thin pack, we would have REF_DELTA objects early
on that depend on bases appended to the end of the pack. So I really
wonder if we should scrap this partial reuse and either just have full
reuse, or go through the regular object_entry construction.

Vicent, you've thought about the reuse code a lot more than I have. Any
thoughts?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]