Re: [PATCH RFC 5/5] cache: Use ce_norm_sha1().

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 20 Apr 2010 00:25:00 -0700

"Henrik Grubbström (Grubba)"  <grubba@xxxxxxxxxx> writes:

> When the conversion filter for a file is changed, the file may get
> listed as modified even though the user has not made any changes to it.
> This patch makes the index ignore such changes. It also makes git-diff
> compare with the normalized content rather than the original content.

Hmm, I am not happy with this.  A typical use case I am imagining goes
like this:

 0. You have a project with LF line ending.  You clone to a filesystem
    that needs autocrlf but somehow it is not set, and end up with files
    with LF line ending in your working tree.

 1. You notice the mistake, and set autocrlf.  "git diff" does not say
    anything, as the index is clean.

 2. Once you fixed the line endings in the working tree files, however,
    "git diff" will say the files are different, but there is no actual
    change (i.e. you see "diff --git a/file b/file" and nothing else).

 3. "git update-index --refresh" does not improve the situation, as it
    (thinks) it knows the blob and the working tree file are different.

I was hoping to see a solution where you will add a stronger version of
"refresh" without having to do anything else other than recording "how did
I munge the file in the working tree to produce the blob".  The third step
would change to:

 3. "git update-index --refresh" notices that the conversion parameters
    are different since the last time the files in the working tree were
    looked at (i.e. immediately after a "clone", working tree files are
    what git wrote out using convert_to_working_tree() and you know what
    conversion you used; after the user modified files in the working tree
    and said "git add", you know you what conversion parameters you ran
    convert_to_git() with to produce blobs).  The paths that has different
    conversion parameters are re-indexed to see if they hash to the same
    sha1 as recorded in the index.  If they have changed, their index
    entries are left intact (i.e. you will still show the differences);
    otherwise you update the cached stat information for their index
    entries.

The above example scenario is about crlf conversion, but the same idea
should apply to other types of conversions (e.g. smudge/clear filter
pair), no?

I can see that it would be benefitial to store what conversions were used
to turn the input into the canonical version that resulted in the object
store and registered in the index, but I am not sure why the re-indexed
versions need to be even stored in the index (either in-core, let alone
on-disk) nor produce new blob objects.  What am I missing?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html