Re: [PATCH v3 3/3] correct ce_compare_data() in a middle of a merge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016-07-06 16.57, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@xxxxxx> writes:
> 
>> At some point inside the merge, Git calls read-cache.c/ce_compare_data(),
>>   to check if the path named "file" is clean.
>>   According to the tree information, the path "file" has the sha1 99b633.
>>   #Note:
>>   #sha1 99b633 is "first line\nsame line\n"
> 
> I thought your scenario was that our side had CRLF both in the blob
> and in the working tree.  So this is a different example?  Our side
> has LF in the blob that is checked out with CRLF in the working tree,
> and their side has CRLF in the blob?
> 
That was probably to confuse myself, and the rest of the world,
sorry for confusion.

This is the callstack:

merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned char *sha1,
		const char *path, int stage, int refresh, int options)
{
	struct cache_entry *ce;
	ce = make_cache_entry
	if (!ce)
		return error(_("addinfo_cache failed for path '%s'"), path);
	return add_cache_entry(ce, options);
}
#Calls
read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0)


struct cache_entry *make_cache_entry(unsigned int mode,
		const unsigned char *sha1, const char *path, int stage,
		unsigned int refresh_options)
{
        [snip]
	ret = refresh_cache_entry(ce, refresh_options);
	if (ret != ce)
		free(ce);
	return ret;
}

#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);

#Calls
ie_modified()

#Calls
read-cache.c/ie_match_stat()

#Calls
changed |= ce_modified_check_fs(ce, st);

#Calls
ce_compare_data(path=file sha1=0x99b633)

#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here, the parameter sha is the output.

Deep down, has_cr_in_index(path) will look at ad55e2 instead,
which is wrong.


>> ce_compare_data() starts the work:
>> OK, lets run index_fd(...,ce->name,...)
>> # index_fd will simulate a "git add"  and return the sha1 (via the sha1 pointer)
>> # after the hashing.
>>
>> # ce_compare_data() will later compare ce->sha1 with the result stored in
>> # the local sha1. That's why a sha1 is in the parameter list.
>> # To return the resulting hash:
>>
>> ce_compare_data() calls index_fd(sha1, ..., ce->name, ...)
>>
>> #Down in index_fd():
>>
>> sha1_file.c/index_fd() calls index_core() (after consulting the attributes)
>> index_core() reads the file from the working tree into memory using
>> read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
>> to calculate the sha1.
>>
>> Before the hashing is done, index_mem() runs
>> convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
>> to convert  "blobs to git internal format".
>>
>>
>> Here, convert_to_git() consults the .gitattributes (again) to find out that
>> crlf_action is CRLF_AUTO in our case.
>> The "new safer autocrlf handling" says that if a blob as any CR, don't convert
>> CRLFs at "git add".
>>
>> convert.c/crlf_to_git() starts to do it's job:
>> - look at buf (It has CRLF, conversion may be needed)
>> - consult blob_has_cr()
>>   # Note: Before this patch, has_cr_in_index(path)) was used
>>
>> #Again, before this patch,
>> has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the
>> blob into memory.
>>
>> Now read_blob_data_from_cache() is a macro, and we end up in
>> read_blob_data_from_index(&the_index, (path), (sz))
>>
>> read-cache.c/read_blob_data_from_index() starts its work:
>> 	pos = index_name_pos(istate, path, len);
>> 	if (pos < 0) {
>> 		/*
>> 		 * We might be in the middle of a merge, in which
>> 		 * case we would read stage #2 (ours).
>> 		 */
>>
>> # And here, and this is important to notice, "ours" is sha1 ad55e2,
>> # which corresponds to "first line\r\nsame line\r\n"
> 
> Where did 99b633 come from then?  There still is something missing
> in this description.
> 
> Puzzled...
This is an unfinished attempt for a commit message:
--------------------------------------------------
correct ce_compare_data() at the end of a merge

The following didn't work as expected:

 - At the end of a merge
 - merge.renormalize is true,
 - .gitattributes = "* text=auto"
 - core.eol = crlf

Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
with LF "first line\nsame line\n".

The expected result of the merge is "first line\nsame line\n".

The content in the working tree is "first line\r\nsame line\r\n",
and ce_compare_data() should find that the content is clean and return 0.

The following callstack does not work:
merge-recursive.c/add_cacheinfo(path=file sha1=0x99b633)
#Calls
refresh_cache_ent(&the_index, ce, options, NULL, NULL);

#Calls
ie_modified()

#Calls
read-cache.c/ie_match_stat()

#Calls
changed |= ce_modified_check_fs(ce, st);

#Calls
ce_compare_data(path=file sha1=0x99b633)

#Calls
index_fd(..., ..., ce->name, ...)
#Note the sha is lost here.

#Calls
index_core()

index_core() reads the file from the working tree into memory using
read_in_full(), and after that index_mem() calls hash_sha1_file(buf)
to calculate the sha1.
Before the hashing is done, index_mem() runs
convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE)
to convert  "blobs to git internal format".

Here, convert_to_git() consults the .gitattributes (again) to find out that
crlf_action is CRLF_AUTO in our case.
The "new safer autocrlf handling" says that if a blob as any CR, don't convert
CRLFs at "git add".

convert.c/crlf_to_git() starts to do it's job:
- look at buf (It has CRLF, conversion may be needed)
- consult blob_has_cr()
  # Note: Before this patch, has_cr_in_index(path)) was used

#Before this patch,
has_cr_in_index(path)
#Calls
read_blob_data_from_cache(path, &sz) to read the blob into memory.

Now read_blob_data_from_cache() is a macro, and we end up in
read_blob_data_from_index(&the_index, (path), (sz))

read-cache.c/read_blob_data_from_index() starts its work:
	pos = index_name_pos(istate, path, len);
	if (pos < 0) {
		/*
		 * We might be in the middle of a merge, in which
		 * case we would read stage #2 (ours).
		 */

# And here, and this is important to notice, "ours" is sha1 ad55e2,
# which corresponds to "first line\r\nsame line\r\n"

The result of the check is that the blob 99b633 doesn't seem
to match the working tree, and the whold merge is aborted:
  error: addinfo_cache failed for path 'file'

Solution:
Make sure that ce_compare_data() forwards the source sha1 into crlf_to_git().

Replace has_cr_in_index(path) with blob_has_cr(sha1), and forward
the source sha1 from ce_compare_data() into blob_has_cr(sha1).

While at it, rename has_cr_in_index() into blob_has_cr()
and replace 0 with SAFE_CRLF_FALSE.

Add a TC in t6038 to have a test coverage under Linux.









--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]