Re: bug report: "git pack-redundant --all" crash in minimize()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 16, 2020 at 02:22:52PM +0100, Daniel Klauer wrote:

> Background: bitbake downloads git repositories during a build process
> and supports caching them locally (in form of bare repos in some
> user-defined directory). This prevents having to re-download them during
> the next build, and also it is a convenient mirroring/backup system in
> case the original URLs stop working.
> 
> As far as I can tell (since I'm not a bitbake developer) the git
> pack-redundant invocation is one of multiple calls meant to improve
> storage (probably minimize disk usage) of the locally cached git repos.
> For reference, please take a look at the other git commands it's
> invoking [1], and at the commit messages of the commits that added these
> invocations [2] [3] [4].
> 
> If doing it that way seems wrong, I'll report the issue to bitbake
> upstream too. Maybe there is a better way to do whatever bitbake wants
> to do here?

Thanks for that context.

I don't think it's _wrong_, in the sense that what they want to do
(remove redundant packs) is a reasonable thing to want. But in practice
I suspect that it rarely helps. It only makes sense if a pack is fully
made redundant by other packs. But that is unlikely to happen after a
fetch, because Git tries not to send objects that already exist. So
while there could be overlap, it's unlikely that full packs are
candidates for deletion. And if any are, then that is probably a sign
that fetch is not being given enough information (e.g., if there are
packs being copied into the repo behind the scenes, make sure that there
are matching refs pointing to their objects, so Git knows it has that
part of the object graph).

For saving space, "git repack -ad" is a much better option. It puts
everything reachable into a single pack, which means:

  - if two packs contain duplicates of an object, we'll end up with only
    a single copy, even if those packs also contained some unique
    objects

  - by putting all objects in the same pack, we have more opportunities
    for delta compression between similar objects

  - we'll drop any unreachable objects completely (presumably this is
    desirable here, but if they're trying to keep objects that don't
    have refs pointing at them as part of some caching scheme, they
    might not; passing "-k" will keep the unreachable objects, too)

Since they're doing other maintenance like "pack-refs", then running
"git gc" may be preferable, as it would cover that, too. Use
"--prune=now" to drop the unreachable objects immediately (as opposed to
giving them a 2-week grace period). Note that there's no equivalent to
repack's "-k" from git-gc", so if they need that, they'll have to invoke
git-repack directly.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux