Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@xxxxxxxxxxxx> writes:
> 
> >> So when you pick the copy of Y out of another pack, what's so
> >> different?  After emitting Y to the resulting pack stream (and
> >> remembering where in the packstream you did so), when it is X's turn
> >> to be emitted, shouldn't you be able to compute the distance in the
> >> resulting packstream to represent X as an ofs-delta against Y, which
> >> should already be happening when you had both X and Y in the same
> >> original pack?
> >
> > Good question. The difference is that if you're reusing X and Y from
> > same pack, you know that Y occurs some number of bytes *before* X in the
> > resulting pack.
> >
> > But if Y comes from a different pack, it may get pushed further back in
> > the MIDX pseudo-pack order. So in that case the assembled pack may list
> > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> > deltas require that the base object appears first.
> 
> That is what we have always done even before we started bitmap based
> optimization.  If we happen to write Y before X, we consider doing
> ofs-delta for X, but otherwise we do ref-delta for X.  We do reorder
> fairly late in the pipeline when we notice that X that we are about
> to write out depends on Y that we haven't emitted to avoid this,
> though.  All of that the bitmap-based optimization code path should
> be able to imitate, I would think.

A small nitpick on your final sentence here. As you note, we do not ever
write Y before X, because compute_write_order() always places bases
before their deltas in the output pack (and we do not allow cycles of
deltas, of course).

And even with bitmaps we'd do the same, as long as those objects are
both fed to the regular pack-writing machinery.

It is only the special verbatim-pack-reuse[1] code that is trying to
blit out the start of an existing pack that is affected. And in theory
there it _could_ try to reorder to produce an ofs delta, but in practice
the whole point is to take a single very cheap pass over the start of
the pack (or multiple packs in the case of the midx). Doing any
reordering would be counterproductive to the "cheap" adjective there (it
does not even keep a list of object ids it is sending), so we are better
to leave those objects for the regular output code (which does make such
a list).

Taylor's series introduces an in-between where we choose not to reorder,
but switch to REF_DELTA. That is still cheap on CPU on the generating
side, though the resulting pack is slightly larger.

-Peff

[1] I wish we had good names to distinguish the various cases, because
    the term "reuse" is kind of overloaded. The "slower" regular
    object-sending path may still reuse verbatim bytes found in an
    on-disk path. But this "blit out matching parts of a pack without
    otherwise considering the objects" feature happens outside of that.
    We called it "pack reuse" back in 2013, but that was not a good name
    even then. I don't have a good suggestion, though.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux