Re: [PATCH] git exproll: steps to tackle gc aggression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King wrote:
> It depends on what each side has it, doesn't it? We generally try to
> reuse on-disk deltas when we can, since they require no computation. If
> I have object A delta'd against B, and I know that the other side wants
> A and has B (or I am also sending B), I can simply send what I have on
> disk. So we do not just blit out the existing pack as-is, but we may
> reuse portions of it as appropriate.

I'll raise some (hopefully interesting) points. Let's take the example
of a simple push: I start send-pack, which in turn starts receive_pack
on the server and connects its stdin/stdout to it (using git_connect).
Now, it reads the (sha1, ref) pairs it receives on stdin and spawns
pack-objects --stdout with the right arguments as the response, right?
Overall, nothing special: just pack-objects invoked with specific
arguments.

How does pack-objects work? ll_find_deltas() spawns some worker
threads to find_deltas(). This is where this get fuzzy for me: it
calls try_delta() in a nested loop, trying to find the smallest delta,
right? I'm not sure whether the interfaces it uses to read objects
differentiates between packed deltas versus packed undeltified objects
versus loose objects at all.

Now, the main problem I see is that a pack has to be self-contained: I
can't have an object "bar" which is a delta against an object that is
not present in the pack, right? Therefore no matter what the server
already has, I have to prepare deltas only against the data that I'm
sending across.

> Of course we may have to reconstruct deltas for trees in order to find
> the correct set of objects (i.e., the history traversal). But that is a
> separate phase from generating the pack's object content, and we do not
> reuse any of the traversal work in later phases.

I see. Are we talking about tree-walk.c here? This is not unique to
packing at all; we need to do this kind of traversal for any git
operation that digs into older history, no? I recall a discussion
about using generation numbers to speed up the walk: I tried playing
with your series (where you use a cache to keep the generation
numbers), but got nowhere. Does it make sense to think about speeding
up the walk (how?).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]