Re: [PATCH v8 10/10] ce_compare_data() did not respect conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016-05-02 21.33, Junio C Hamano wrote:
> Junio C Hamano <gitster@xxxxxxxxx> writes:
> Let's step back a bit and make sure we are on the same page.  I
> think this "series" conflates a bit too many things into a single
> topic.
> 
>  * The comparison between the index and the working tree, i.e. "git
>    diff", should compare result of convert_to_git() with what is in
>    the index, and the world around it should be made consistent with
>    that.  Your separate "git blame" fix to add missing knowledge
>    that convert_to_git() would not do s/CRLF/LF/ for a path whose
>    index entry already is contaminated with CR falls into this
>    category and was a very good thing to do.  
> 
>  * A convert_to_git() and convert_to_working_tree() pair that do not
>    roundtrip would (by definition) leave contents in the working
>    tree, that, when passed through convert_to_git(), will be
>    different from the index, upon completion of "reset --hard".  We
>    _should_ fix it so that "git diff" _reports_ differences.
>    Currently, lstat(2) based optimization hides this in a racy way
>    (when racy Git kicks in to reinspect the index and the working
>    tree file actually matches, it finds out that they do not match),
>    it is a bug that needs to be fixed, not 10/10 where it tries to
>    hide the differences consistently and spreads the bug.  I haven't
>    studied 8/10 carefully yet, but it seems to attempt the same.
> 
>  * I think the "text=auto eol=THIS" that did not mean "I do not care
>    to specify which ones are text files.  Please detect the file
>    type, and for those automatically detected, please make sure that
>    the contents follwo THIS eol convention." was a bug, and what
>    07/10 tries to do is a good thing.  
> 
> By the way, lack of the cover letter of this series made it more
> painful to write a reply like this than necessary.  A cover letter
> for a trivial 3-patch series might be overkill, but for anything
> with substance that spans more than 4-5 patches, a cover letter to
> describe the overall direction would really help.


The 10/10 needs to be replaced with something different, and I start to
get a better picture, why.
read_cache.c/ce_compare_data() checks what "git add" will change in the index.

sha1_file.c/index_fd() will read the file content from the working tree,
run convert_to_git() and calculate the sha1, which read_cache.c feeds into
hashcmp().

When convert_to_git() is run, 3 steps are taken:
- apply the filter, if any
- run crlf_to_git(), if attributes and core.autocrlf say so
- run ident, if specified.

Now the crlf_to_git() uses has_cr_in_index(const char *path) to find
out if we should keep the CRLF or not.

Side note: the suggested patches will use
get_convert_stats_sha1(get_sha1_from_cache(path),)

This works pretty well under normal "git add", but fails in t6038,
when we do a merge.

The new function get_sha1_from_index() can not find the sha1,
"read-cache.c/get_sha1_from_index:2392 path=file pos=-3"
and falls into the code path for
		/*
		 * We might be in the middle of a merge, in which
		 * case we would read stage #2 (ours).
		 */
read-cache.c/get_sha1_from_index:2408 path=file pos=3
read-cache.c/get_sha1_from_index:2416 path=file sha1=ad55e2

(The line number are with debug code)

The problem is, that ad55e2 is _not_ the sha1, which
read_cache/ce_compare_data() had been asked to look at.
Instead we should check the blob with 99b633.

The result is that convert/get_convert_stats_sha1() is called on the
wrong blob (the one with CRLF instead the one without CRLF), and this
makes t6038 from 9/10 fail.
10/10 rescues the situation, by using the correct blob :-)

In short, ce_compare_data() needs to forward ce->sha1 the whole way into
convert.c/crlf_to_git() and get_convert_stats_sha1().

While at it, I realized that we call a convert_attrs(&ca, path) a couple
of times (e.g. in would_convert_to_git(), to find out that we really don't have
any attributes set.

It could be nice to do that only once.

The next step will be to add the improvements/fixes for the ce_compare_data()
chain as described above, and then put 7/10..9/10 on top of that.
This will probably take some time, so that's why I asked if 1/10..4/10 could
proceed as is ?
(and the next version with cover letter, sorry for that)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]