On Tue, Feb 23, 2021 at 12:07:52AM -0800, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > (I found it convenient while developing this patch to have 'git > > pack-objects' report the number of objects which were visited and got > > their namehash fields filled in during traversal. This is also included > > in the below patch via trace2 data lines). > > It does sound like a well thought out strategy to give name-hash to > entries that we may have to find good delta bases afresh, while > stopping upon hitting parts of the history we won't have to (either > because they are in "excluded" packs, which you did here, or because > they can take advantage of the "reuse existing delta base" logic [*], > which we may want to look further into in future follow-on topics). > > [Footnote] > > * I presume that such a logic may, instead of stopping at an object > that is in an excluded pack, stop at an object that is stored in > the current pack as a delta and its base is also going to be > packed (and the latter by definition is always true, I presume, as > everything in the included pack would be packed) I'm not sure if using deltas as a heuristic for stopping traversal makes sense. They don't necessarily correspond to the history graph, or to what was pushed. E.g., if I see that tree X is a delta against tree Y, then we might say: if Y is not excluded by being in one of the base packs, then we will reuse the delta. We do not need the namehash of X, since we already know its delta. But that does not tell us anything about the subtrees and blobs contained in X. We still want to traverse X in order to find out _their_ name hashes, because it is likely that we will need to delta some of those. Of course if you see a blob that is a delta that you plan to reuse, you know you can stop there. But by the time you get to it, you already know its namehash, and there is nothing left to traverse. :) -Peff