Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 10 Oct 2024 09:46:00 -0700

Taylor Blau <me@xxxxxxxxxxxx> writes:

> This patch series enables more objects to be candidates for verbatim
> reuse when generating a pack with the aide of reachability bitmaps.
>
> By the end of this series, two new classes of objects are now reuse
> candidates which were not before. They are:
>
>   - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
>     were selected from different packs, the delta was not reusable.

Hmph.  Suppose that you need to send object X, you happen to have X
as a ofs-delta against Y, but Y may appear in multiple packs.

Even if the copy of Y you are going to send together with X is from
the same packfile, you may not be sending all the objects between X
and Y in the original local packfile, so you would need to recompute
the offset to give ofs-delta X to the distance between X and Y in
the resulting packstream, no?

So when you pick the copy of Y out of another pack, what's so
different?  After emitting Y to the resulting pack stream (and
remembering where in the packstream you did so), when it is X's turn
to be emitted, shouldn't you be able to compute the distance in the
resulting packstream to represent X as an ofs-delta against Y, which
should already be happening when you had both X and Y in the same
original pack?

>   - Thin deltas. In both single- and multi-pack bitmaps, we did not
>     consider reusing deltas whose base object appears in the 'haves'
>     bitmap.

I hope this optimization does not kick in unless the receiving end
is prepared to do "index-pack --fix-thin".

I've never thought about this specifically, but it is interesting to
realize that by definition "thin" deltas cannot be ofs-deltas.

> Of course, REF_DELTAs have a less compact representation than
> OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> slightly larger pack.

Is comparing ref- vs ofs- delta a meaningful thing to do in the
context of this series?

What does the current code without these patches do in the same
situation?  Give up on reusing the existing delta and then?  If we
send the base representation instead, the comparison is "we
currently do not use delta, but with this change we can reuse delta
(even though we do not bother recompute the offset and instead use
ref-delta)".

Do we recompute the delta on the fly and show it as an ofs-delta
with the current code?  Then the comparison would be "we spend time
doing diff-delta once right now but instead reuse an existing one
(even though we do not bother recompute the offset and instead use
ref-delta)".