Re: How DELTA objects values work and are calculated

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/4/19 11:46 PM, Duy Nguyen wrote:
On Sat, Jan 5, 2019 at 9:49 AM Farhan Khan <khanzf@xxxxxxxxx> wrote:

Hi all,

I'm having trouble understanding how OBJ_REF_DELTA and OBJ_REF_DELTA
(deltas) work in git. Where does git calculate the sha1 hash values
when doing "git index-pack" in builtin/index-pack.c. I think my lack
of understanding of the code is compounded the fact that I do not
understand what the two object types are.

 From tracing the code starting from index-pack, all non-delta object
type hashes are calculated in index-pack.c:1131 (parse_pack_objects).
However, when the function ends, the delta objects hash values are set
to all 0's.

Delta objects depend on other objects (and even delta ones). To
calculate its sha1 values we may need to recursively calculate sha1
values of its base objects. This is why we do it in a separate phase
because the calculation is more complicated than non-delta objects.

My questions are:
A) How do Delta objects work?

A delta object consists of a reference to the base object (either an
sha1 value, or the offset to where the object is) and a "delta" to be
applied on (it's basically a binary diff).

B) Where and how are the sha1 values calculated?

Start at threaded_second_pass() in index-pack.c, we go through all
delta objects here and try to calculate their sha1 values. Eventually
you'll hit resolve_delta(), where the delta is actually applied to the
base object in the patch_delta() call, and the sha1 value calculated
in the following hash_object_file() call.


I have read Documentation/technical/pack-format.txt, but am still not clear.

Thank you!
--
Farhan Khan
PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE





Hi Duy,

Thanks for explaining the Delta objects.

What does a OBJ_REF_DELTA object itself consist of? Do you have to uncompress it to parse its values? How do you get its size?

I read through resolve deltas which leads to threaded_second_pass, where you suggested to start, but I do not understand what is happening at a high level and get confused while reading the code.

From threaded_second_pass, execution goes into a for-loop that runs resolve_base(), which runs runs find_unresolved_deltas(). Is this finding the unresolved deltas of the current object (The current OBJ_REF_DELTA we are going through)? This then runs find_unresolved_deltas() and shortly afterwards find_unresolved_deltas_1(). It seems that find_unresolved_deltas_1() is applying deltas, but I am not certain.

I do not understand what is happening in any of these functions. There are some comments on builtin/index-pack.c:883-904

Overall, I do not understand this entire process, what values to capture along the way, and how they are consumed. Please provide some guidance on how this process works.

Thank you!
Farhan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux