On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > >> So when you pick the copy of Y out of another pack, what's so > >> different? After emitting Y to the resulting pack stream (and > >> remembering where in the packstream you did so), when it is X's turn > >> to be emitted, shouldn't you be able to compute the distance in the > >> resulting packstream to represent X as an ofs-delta against Y, which > >> should already be happening when you had both X and Y in the same > >> original pack? > > > > Good question. The difference is that if you're reusing X and Y from > > same pack, you know that Y occurs some number of bytes *before* X in the > > resulting pack. > > > > But if Y comes from a different pack, it may get pushed further back in > > the MIDX pseudo-pack order. So in that case the assembled pack may list > > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset > > deltas require that the base object appears first. > > That is what we have always done even before we started bitmap based > optimization. If we happen to write Y before X, we consider doing > ofs-delta for X, but otherwise we do ref-delta for X. We do reorder > fairly late in the pipeline when we notice that X that we are about > to write out depends on Y that we haven't emitted to avoid this, > though. All of that the bitmap-based optimization code path should > be able to imitate, I would think. A small nitpick on your final sentence here. As you note, we do not ever write Y before X, because compute_write_order() always places bases before their deltas in the output pack (and we do not allow cycles of deltas, of course). And even with bitmaps we'd do the same, as long as those objects are both fed to the regular pack-writing machinery. It is only the special verbatim-pack-reuse[1] code that is trying to blit out the start of an existing pack that is affected. And in theory there it _could_ try to reorder to produce an ofs delta, but in practice the whole point is to take a single very cheap pass over the start of the pack (or multiple packs in the case of the midx). Doing any reordering would be counterproductive to the "cheap" adjective there (it does not even keep a list of object ids it is sending), so we are better to leave those objects for the regular output code (which does make such a list). Taylor's series introduces an in-between where we choose not to reorder, but switch to REF_DELTA. That is still cheap on CPU on the generating side, though the resulting pack is slightly larger. -Peff [1] I wish we had good names to distinguish the various cases, because the term "reuse" is kind of overloaded. The "slower" regular object-sending path may still reuse verbatim bytes found in an on-disk path. But this "blit out matching parts of a pack without otherwise considering the objects" feature happens outside of that. We called it "pack reuse" back in 2013, but that was not a good name even then. I don't have a good suggestion, though.