On Mon, May 20, 2019 at 07:04:14PM -0400, Nicolas Pitre wrote: > > That still has some value even if your commit ends up with a question > > mark. There's not much to dig out of 636171cb80 (make index-pack able > > to complete thin packs., 2006-10-25). Adding Nico, maybe he still > > remembers... > > What about this comment in fix_unresolved_deltas(): > > /* > * Since many unresolved deltas may well be themselves base objects > * for more unresolved deltas, we really want to include the > * smallest number of base objects that would cover as much delta > * as possible by picking the > * trunc deltas first, allowing for other deltas to resolve without > * additional base objects. Since most base objects are to be found > * before deltas depending on them, a good heuristic is to start > * resolving deltas in the same order as their position in the pack. > */ > > Doesn't that cover it? Hmm. It does help, but only because my earlier comments were actually wrong. I was claiming that index-pack would not be able to resolve "A" from the delta chain when A is a regular delta, B is a thin delta, and C is not in the pack. Because I did not see how we would ever find base "B" in that third pass of fix_unresolved_deltas(), because we look only for objects we already have on disk. But I think the trick that I was missing is that if we see "B" first, we'll resolve it against C that we already have, but then we'll _also_ look for children that this now enables us to resolve. So we'd resolve "A" at that point, and then when we later hit "A" in the loop from fix_unresolved_deltas(), we skip it because of this: if (objects[d->obj_no].real_type != OBJ_REF_DELTA) continue; So this situation _can_ happen, and we do handle it properly. I don't know why the tests I showed in [1] didn't work. Obviously I botched something. I'm still not quite sure what would happen if we see "A" first (i.e., the delta is physically in the pack before its base, "B") and both are REF_DELTA. Right now we'd skip over A, knowing that we don't have B. And then when we get to B, we'd correctly resolve A on top of it (and if we have no such B, we'd eventually complain "hey, there are still some unresolved deltas"). But what happens if we _do_ have "B", but we just weren't able to tell the sender for some reason (e.g., it's unreferenced). We'd add a new copy of "B" to the pack while resolving "A", as part of --fix-thin. And then when we resolve "B", we'd get _another_ copy of "B" in the pack, and our result would have a duplicate. I don't think this happens in practice because we'd generally use OFS_DELTA in the first place these days. And even for REF_DELTA, I think we prefer to put bases before their deltas (i.e., we'd always see "B" before "A"). But I think if we ever _did_ see it (alternate implementation? malicious packfile?) we'd generate duplicates. I _think_ that would then cause us to barf due to the duplicate check from 68be2fea50 (receive-pack, fetch-pack: reject bogus pack that records objects twice, 2011-11-16). And that's true today, even without Jonathan's on-demand fetching patch. So I don't think that materially changes the requirements for correctness. It does mean that we might fetch (but not use!) a thin base we don't need, but only if the sender uses REF_DELTA for non-thin deltas, which we wouldn't normally do. -Peff [1] https://public-inbox.org/git/20190517043939.GA12063@xxxxxxxxxxxxxxxxxxxxx/