Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016-07-02 00.11, Junio C Hamano wrote:
[snip]
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)

 	if (fd >= 0) {
 		unsigned char sha1[20];
-		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+		unsigned flags = HASH_USE_SHA_NOT_PATH;
+		memcpy(sha1, ce->sha1, sizeof(sha1));
+		if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
 			match = hashcmp(sha1, ce->sha1);
 		/* index_fd() closed the file descriptor already */
 	}

I do not quite see how "Don't use ce->sha1, but use sha1 given to
you" flag affects this call, when sha1 and ce->sha1 are the same due
to memcpy() just before the call?


Lets step back a second. and try to explain whats going on.

Do a merge (and renormalize the files).

At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
  to check if the path named "file" is clean.
  According to the tree information, the path "file" has the sha1 99b633.
  #Note:
  #sha1 99b633 is "first line\nsame line\n"

ce_compare_data() starts the work:
OK, lets run index_fd(...,ce->name,...)
# index_fd will simulate a "git add"  and return the sha1 (via the sha1 pointer)
# after the hashing.

# ce_compare_data() will later compare ce->sha1 with the result stored in
# the local sha1. That's why a sha1 is in the parameter list.
# To return the resulting hash:

ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)

#Down in index_fd():

sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
index_core() reads the file from the working tree into memory using
read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
to calculate the sha1.

Before the hashing is done, index_mem() runs
convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
to convert  "blobs to git internal format".


Here, convert_to_git() consults the .gitattributes (again) to find out that
crlf_action is CRLF_AUTO in our case.
The "new safer autocrlf handling" says that if a blob as any CR, don't convert
CRLFs at "git add".

convert.c/crlf_to_git() starts to do it's job:
- look at buf (It has CRLF, conversion may be needed)
- consult blob_has_cr()
  # Note: Before this patch, has_cr_in_index(path)) was used

#Again, before this patch,
has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
blob into memory.

Now read_blob_data_from_cache() is a macro, and we end up in
read_blob_data_from_index(&the_index, (path), (sz))

read-cache.c/read_blob_data_from_index() starts its work:
	pos = index_name_pos(istate, path, len);
	if (pos < 0) {
		/*
		 * We might be in the middle of a merge, in which
		 * case we would read stage #2 (ours).
		 */

# And here, and this is important to notice, "ours" is sha1 ad55e2,
# which corresponds to "first line\r\nsame line\r\n"
# On our way in the callstack the information what to check had been lost:
# blob 99b633 and nothing else.
# read_blob_data_from_index() doesn't know the sha, but makes a guess,
# derived from path.
# The guess works OK for most callers: to take "ours" in a middle
# of a merge, but not here.

#Back to crlf_to_git():
The (wrong) blob has CRs in it, so crlf_to_git() decides not to convert CRLFs,
(and this is wrong).

# And now we go all the way back in the call chain:

The content in the worktree which is "first line\r\nsame line\r\n" is hashed:
hash_sha1_file(buf, size, typename(type), sha1);
#and the result is stored in the return pointer "sha1": ad55e2

#Back to ce_compare_data():
match = hashcmp(sha1, ce->sha1);

#And here sha1=ad55e2 != ce->sha199b633

The whole merge is aborted:

   error: addinfo_cache failed for path 'file'
    file: unmerged (cbd69ec7cd12dd0989e853923867d94c8519aa52)
    file: unmerged (ad55e240aeb42e0d9a0e18d6d8b02dd82ee3e527)
    file: unmerged (99b633103c15c20cebebf821133ab526b0ff90b2)
    fatal: git write-tree failed to write a tree
#Side note: cbd69ec is another file in the working tree.

#######################

With this patch,
ce_compare_data() feeds the sha1 99b633 into
blob_has_cr(const unsigned char *index_blob_sha1).

crlf_to_git() says:
"first line\r\nsame line\r\n" in the worktree needs to be converted
into
"first line\nsame line\n" as the "new" blob.

The new (converted) blob "first line\nsame line\n"
is hashed into 99b633, so ce_compare_data() says
fine, converting "first line\r\nsame line\r\n" will give sha1 99b633,
lets continue with the merge, and do the normalization.

# Note1:
# The renormalization is outside what this patch fixes.
# But I didn't manage to construct a test case, which triggered
# "fatal: git write-tree failed to write a tree" without using
# merge.renormalize

# Note2:
# ce_compare_data() has (if I understand things right)
# no knowledge that a merge is going on
# ce_compare_data() has now knowledge that a merge
# with a merge with merge.renormalize=true is going on either.

#Note3:
# I hope that this explanation makes it more clear,
# why ce->sha1 must be forwarded into blob_has_cr() (to check the right blob)
# and that ce->name must be forwarded as path into crlf_to_git()
# (to check the .gitattributes)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]