Re: [RFC PATCH 10/10] pack-objects: improve partial packfile reuse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> The current code does so by creating a new entry in the reused_chunks
> array. In the worst case that can grow to have the same number of
> entries as we have objects. So this code was an attempt to pad the
> header of a shrunken entry to keep it the same size. I don't remember
> all the problems we ran into with that, but in the end we found that it
> didn't actually help much, because in practice we don't end up with a
> lot of chunks anyway.

Hmm, I am kind of surprised that the decoding side allowed such a
padding.

> For instance, packing torvalds/linux on our servers just now reused 6.5M
> objects, but only needed ~50k chunks.

OK, that's a good argument for not trying to optimize.

> I think the original code may simply have been buggy and nobody noticed.
> Here's what I wrote when this line was added in our fork:
>
>       pack-objects: check reused pack bitmap for duplicate objects
>   
>       If a client both asks for a tag by sha1 and specifies
>       "include-tag", we may end up including the tag in the reuse
>       bitmap (due to the first thing), and then later adding it to
>       the packlist (due to the second). This results in duplicate
>       objects in the pack, which git chokes on. We should notice
>       that we are already including it when doing the include-tag
>       portion, and avoid adding it to the packlist.
>   
>       The simplest place to fix this is right in add_ref_tag,
>       where we could avoid peeling the tag at all if we know that
>       we are already including it. However, I've pushed the check
>       instead into have_duplicate_entry. This fixes not only this
>       case, but also means that we cannot have any similar
>       problems lurking in other code.
>   
>       No tests, because git does not actually exhibit this "ask
>       for it and also include-tag" behavior. We do one or the
>       other on clone, depending on whether --single-branch is set.
>       However, libgit2 does both.
>
> I'm not sure why we didn't notice it with the older reuse code. It may
> simply have been that it hardly ever triggered on our servers.

Impressed by the careful thinking here.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux