> On Tue, May 14, 2019 at 02:10:55PM -0700, Jonathan Tan wrote: > > > Support for lazy fetching should still generally be turned off in > > index-pack because it is used as part of the lazy fetching process > > itself (if not, infinite loops may occur), but we do need to fetch the > > REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that > > those are REF_DELTA themselves, because we do not send "have" when > > making such fetches.) > > I agree that the current implementation (and probably any sane > implementation) would not send us a delta if we have not provided any > haves. But this does mean that a malicious server could send a client > into an infinite loop. > > Pretty unlikely, but should we put some kind of circuit-breaker into the > client to ensure this? I thought of this - such a server could, but it seems to me that it would be similar to a server streaming random bytes to us without stopping (which is already possible). > > To resolve this, prefetch all missing REF_DELTA bases before attempting > > to resolve them. This both ensures that all bases are attempted to be > > fetched, and ensures that we make only one request per index-pack > > invocation, and not one request per missing object. > > Ah, but now things get more tricky. > > You are assuming that the server does not ever send a REF_DELTA unless > the base object is not present in the pack (it would use OFS_DELTA > instead). If we imagine a server which did, then there are two > implications: > > 1. We might pre-fetch a full copy of an object that we don't need. > It's just that it's stored as a delta in the pack which we are > currently indexing. > > 2. If we pre-fetch multiple objects, some of them may be REF_DELTAs > against each other, leading to an infinite loop. > > Off the top of my head, I am pretty sure your assumption holds for all > versions of Git that support delta-base-offset[1]. But that feels a lot > less certain to me. I could imagine an alternate server implementation, > for example, that is gluing together packs and does not try hard to > order the base before the delta, which would require it to use REF_DELTA > instead of OFS_DELTA. A cursory glance makes me think that REF_DELTA against a base object also in the pack is already correctly handled. Right before the invocation of conclude_pack() (which calls fix_unresolved_deltas(), the function I modified), resolve_deltas() is invoked. The latter invokes resolve_base() (directly or through threaded_second_pass()) which invokes find_unresolved_deltas(), which invokes find_unresolved_deltas_1(), which seems to handle both REF_DELTA and OFS_DELTA. Snipping the rest as I don't think we need to solve those if we can handle REF_DELTA being against an object in a pack, but let me know if you think that some of the points there still need to be addressed.