Re: Using bitmaps to accelerate fetch and clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 1, 2012 at 8:07 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>> You mentioned this before in your idea mail a while back. I wonder if
>> it's worth storing bitmaps for all packs, not just the self contained
>> ones.
>
> Colby and I started talking about this late last week too. It seems
> feasible, but does add a bit more complexity to the algorithm used
> when enumerating.

Yes. Though at server side, if it's too much trouble, the packer can
just ignore open packs and use only closed ones.

>> We could have one leaf bitmap per pack to mark all leaves where
>> we'll need to traverse outside the pack. Commit leaves are the best as
>> we can potentially reuse commit bitmaps from other packs. Tree leaves
>> will be followed in the normal/slow way.
>
> Yes, Colby proposed the same idea.
>
> We cannot make a "leaf bitmap per pack". The leaf SHA-1s are not in
> the pack and therefore cannot have a bit assigned to them.

We could mark all objects _in_ the pack that lead to an external
object. That's what I meant by leaves. We need to parse the leaves to
find out actual SHA-1s that are outside the pack. Or we could go with
your approach below too.

> We could
> add a new section that listed the unique leaf SHA-1s in their own
> private table, and then assigned per bitmap a leaf bitmap that set to
> 1 for any leaf object that is outside of the pack.


> One of the problems we have seen with these non-closed packs is they
> waste an incredible amount of disk. As an example, do a `git fetch`
> from Linus tree when you are more than a few weeks behind. You will
> get back more than 100 objects, so the thin pack will be saved and
> completed with additional base objects. That thin pack will go from a
> few MiBs to more than 40 MiB of data on disk, thanks to the redundant
> base objects being appended to the end of the pack. For most uses
> these packs are best eliminated and replaced with a new complete
> closure pack. The redundant base objects disappear, and Git stops
> wasting a huge amount of disk.

That's probably a different problem. I appreciate disk savings but I
would not want to wait a few more minutes for repack on every
git-fetch. But if this bitmap thing makes repack much faster than
currently, repacking after every git-fetch may become practical.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]