Re: [PATCH 10/19] pack-bitmap: add support for bitmap indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 30, 2013 at 3:47 PM, Vicent Marti <vicent@xxxxxxxxxx> wrote:
> On Wed, Oct 30, 2013 at 9:10 AM, Jeff King <peff@xxxxxxxx> wrote:
>>
>> In fact, I'm not quite sure that even a partial reuse up to an offset is
>> 100% safe. In a newly packed git repo it is, because we always put bases
>> before deltas (and OFS_DELTA objects need this). But if you had a bitmap
>> generated from a fixed thin pack, we would have REF_DELTA objects early
>> on that depend on bases appended to the end of the pack. So I really
>> wonder if we should scrap this partial reuse and either just have full
>> reuse, or go through the regular object_entry construction.
>>
>> Vicent, you've thought about the reuse code a lot more than I have. Any
>> thoughts?
>
> Yes, our pack writing and bitmap code takes enough precautions to
> arrange the objects in the packfile in a way that can be partially
> reused, so for any given bitmap file written from Git, I'd say we're
> safe to always reuse the leader of the pack if this is possible.
>
> For bitmaps generated from JGit, however, we cannot make this
> assumption. I mean, we can right now (from my understanding of the
> current implementation for pack-objects on JGit), but they are free to
> change this in the future.

JGit certainly doesn't promise the ordering behavior, so the fact that
its happening is just luck. The code could change in the future to
invalidate this.

> Obviously I intend to keep the pack reuse on production because the
> CPU savings are noticeable, but we can drop it from the public
> patchset.

I think you should keep it in, its a significant improvement.

> Ideally, we'd have full pack reuse like JGit, but we cannot
> reasonably do that in GitHub because splitting a pack for the network
> root would double our disk usage for all the forks.

I gave a talk the week before about Git bitmaps and why we sometimes
have to slice pack files by object.

Some guy in the audience kept yelling that since its Git its all open
source and `git clone` is "just" a file transfer problem. So maybe for
his GitHub repositories and forks its OK to include the entire fork
network when someone clones?  :-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]