On Sun, Jan 6, 2019 at 5:32 AM Farhan Khan <khanzf@xxxxxxxxx> wrote: > Hi Duy, > > Thanks for explaining the Delta objects. > > What does a OBJ_REF_DELTA object itself consist of? from pack-format.txt (deltified representation) n-byte type and length (3-bit type, (n-1)*7+4-bit length) 20-byte base object name if OBJ_REF_DELTA or a negative relative offset from the delta object's position in the pack if this is an OBJ_OFS_DELTA object compressed delta data > Do you have to uncompress it to parse its values? The delta part is compressed, so yes. The "base object name" is not. > How do you get its size? Uncompress until the end the delta until the end. zlib stream has some sort of "end-of-stream" marker so it knows when to stop. > I read through resolve deltas which leads to threaded_second_pass, where > you suggested to start, but I do not understand what is happening at a > high level and get confused while reading the code. > > From threaded_second_pass, execution goes into a for-loop that runs > resolve_base(), which runs runs find_unresolved_deltas(). Is this > finding the unresolved deltas of the current object (The current > OBJ_REF_DELTA we are going through)? This then runs > find_unresolved_deltas() and shortly afterwards > find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is > applying deltas, but I am not certain. Ah I forgot how "fun" these functions were :) The obvious way to resolve an delta object is to resolve (recursively) its base object first, then you apply delta on top and are done. However that implies recursion, and also not really cache friendly. So what find_unresolve_deltas_1() does is backward. It starts at a (already resolved, e.g. non-delta) base object, then applies deltas for all delta objects that immediately depend on it, then continue to resolve delta objects depending on these children... The find_*_delta_children() functions find these deltas, then find_unresolve_deltas_1() will call resolve_delta() to do the real work - the delta type (OBJ_REF_.. or OBJ_OFS_...) is already known at this point. I believe we know from the first pass - the delta is uncompressed here, with get_data_from_pack() - the base object is obtained via get_base_data(), which is recursive, but since we go backwards from parent to child, base->data should be already valid and get_base_data() becomes no-op > I do not understand what is happening in any of these functions. There > are some comments on builtin/index-pack.c:883-904 > > Overall, I do not understand this entire process, what values to capture > along the way, and how they are consumed. Please provide some guidance > on how this process works. An easier way to understand this is actually run it through a debugger (in single thread mode). Create a small repo with a handful of deltas. Use "git verify-pack -v" to see what object is delta and where... then you have something to double check while you step through the code. > > Thank you! > Farhan -- Duy