On 2016-07-06 16.57, Junio C Hamano wrote: > Torsten Bögershausen <tboegi@xxxxxx> writes: > >> At some point inside the merge, Git calls read-cache.c/ce_compare_data(), >> to check if the path named "file" is clean. >> According to the tree information, the path "file" has the sha1 99b633. >> #Note: >> #sha1 99b633 is "first line\nsame line\n" > > I thought your scenario was that our side had CRLF both in the blob > and in the working tree. So this is a different example? Our side > has LF in the blob that is checked out with CRLF in the working tree, > and their side has CRLF in the blob? > That was probably to confuse myself, and the rest of the world, sorry for confusion. This is the callstack: merge-recursive.c/add_cacheinfo(unsigned int mode, const unsigned char *sha1, const char *path, int stage, int refresh, int options) { struct cache_entry *ce; ce = make_cache_entry if (!ce) return error(_("addinfo_cache failed for path '%s'"), path); return add_cache_entry(ce, options); } #Calls read-cache.c/make_cache_entry(path=file sha1=0x99b633 stage=0) struct cache_entry *make_cache_entry(unsigned int mode, const unsigned char *sha1, const char *path, int stage, unsigned int refresh_options) { [snip] ret = refresh_cache_entry(ce, refresh_options); if (ret != ce) free(ce); return ret; } #Calls refresh_cache_ent(&the_index, ce, options, NULL, NULL); #Calls ie_modified() #Calls read-cache.c/ie_match_stat() #Calls changed |= ce_modified_check_fs(ce, st); #Calls ce_compare_data(path=file sha1=0x99b633) #Calls index_fd(..., ..., ce->name, ...) #Note the sha is lost here, the parameter sha is the output. Deep down, has_cr_in_index(path) will look at ad55e2 instead, which is wrong. >> ce_compare_data() starts the work: >> OK, lets run index_fd(...,ce->name,...) >> # index_fd will simulate a "git add" and return the sha1 (via the sha1 pointer) >> # after the hashing. >> >> # ce_compare_data() will later compare ce->sha1 with the result stored in >> # the local sha1. That's why a sha1 is in the parameter list. >> # To return the resulting hash: >> >> ce_compare_data() calls index_fd(sha1, ..., ce->name, ...) >> >> #Down in index_fd(): >> >> sha1_file.c/index_fd() calls index_core() (after consulting the attributes) >> index_core() reads the file from the working tree into memory using >> read_in_full(), and after that index_mem() calls hash_sha1_file(buf) >> to calculate the sha1. >> >> Before the hashing is done, index_mem() runs >> convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE) >> to convert "blobs to git internal format". >> >> >> Here, convert_to_git() consults the .gitattributes (again) to find out that >> crlf_action is CRLF_AUTO in our case. >> The "new safer autocrlf handling" says that if a blob as any CR, don't convert >> CRLFs at "git add". >> >> convert.c/crlf_to_git() starts to do it's job: >> - look at buf (It has CRLF, conversion may be needed) >> - consult blob_has_cr() >> # Note: Before this patch, has_cr_in_index(path)) was used >> >> #Again, before this patch, >> has_cr_in_index(path) calls read_blob_data_from_cache(path, &sz) to read the >> blob into memory. >> >> Now read_blob_data_from_cache() is a macro, and we end up in >> read_blob_data_from_index(&the_index, (path), (sz)) >> >> read-cache.c/read_blob_data_from_index() starts its work: >> pos = index_name_pos(istate, path, len); >> if (pos < 0) { >> /* >> * We might be in the middle of a merge, in which >> * case we would read stage #2 (ours). >> */ >> >> # And here, and this is important to notice, "ours" is sha1 ad55e2, >> # which corresponds to "first line\r\nsame line\r\n" > > Where did 99b633 come from then? There still is something missing > in this description. > > Puzzled... This is an unfinished attempt for a commit message: -------------------------------------------------- correct ce_compare_data() at the end of a merge The following didn't work as expected: - At the end of a merge - merge.renormalize is true, - .gitattributes = "* text=auto" - core.eol = crlf Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob with LF "first line\nsame line\n". The expected result of the merge is "first line\nsame line\n". The content in the working tree is "first line\r\nsame line\r\n", and ce_compare_data() should find that the content is clean and return 0. The following callstack does not work: merge-recursive.c/add_cacheinfo(path=file sha1=0x99b633) #Calls refresh_cache_ent(&the_index, ce, options, NULL, NULL); #Calls ie_modified() #Calls read-cache.c/ie_match_stat() #Calls changed |= ce_modified_check_fs(ce, st); #Calls ce_compare_data(path=file sha1=0x99b633) #Calls index_fd(..., ..., ce->name, ...) #Note the sha is lost here. #Calls index_core() index_core() reads the file from the working tree into memory using read_in_full(), and after that index_mem() calls hash_sha1_file(buf) to calculate the sha1. Before the hashing is done, index_mem() runs convert_to_git(path, buf, size, &nbuf, SAFE_CRLF_FALSE) to convert "blobs to git internal format". Here, convert_to_git() consults the .gitattributes (again) to find out that crlf_action is CRLF_AUTO in our case. The "new safer autocrlf handling" says that if a blob as any CR, don't convert CRLFs at "git add". convert.c/crlf_to_git() starts to do it's job: - look at buf (It has CRLF, conversion may be needed) - consult blob_has_cr() # Note: Before this patch, has_cr_in_index(path)) was used #Before this patch, has_cr_in_index(path) #Calls read_blob_data_from_cache(path, &sz) to read the blob into memory. Now read_blob_data_from_cache() is a macro, and we end up in read_blob_data_from_index(&the_index, (path), (sz)) read-cache.c/read_blob_data_from_index() starts its work: pos = index_name_pos(istate, path, len); if (pos < 0) { /* * We might be in the middle of a merge, in which * case we would read stage #2 (ours). */ # And here, and this is important to notice, "ours" is sha1 ad55e2, # which corresponds to "first line\r\nsame line\r\n" The result of the check is that the blob 99b633 doesn't seem to match the working tree, and the whold merge is aborted: error: addinfo_cache failed for path 'file' Solution: Make sure that ce_compare_data() forwards the source sha1 into crlf_to_git(). Replace has_cr_in_index(path) with blob_has_cr(sha1), and forward the source sha1 from ce_compare_data() into blob_has_cr(sha1). While at it, rename has_cr_in_index() into blob_has_cr() and replace 0 with SAFE_CRLF_FALSE. Add a TC in t6038 to have a test coverage under Linux. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html