Re: [PATCH 3/3] grep: stop looking at random places for .gitattributes

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 10 Oct 2012 12:44:46 -0700

Johannes Sixt <j.sixt@xxxxxxxxxxxxx> writes:

> Is there already an established definition which the "correct"
> .gitattributes are? IIRC, everywhere else we are looking at the
> .gitattributes in the worktree, regardless of whether the object at the
> path in question is in the worktree, the index, or in an old commit.

No, and it is deliberately kept vague while waiting for us to come
up with a clear definition of what is "correct".

We could declare, from a purist's point of view, that the attribute
should be taken from the same place as the path in question is taken
from.  When running "git add foo.c", we grab the contents of "foo.c"
from the working tree, so ".gitignore" from the working tree should
be applied when dealing with "foo.c".  Similarly, the contents of
blob "foo.c" that "git checkout foo.c" reads from the index would
get attributes from ".gitignore" in the index (to find what its
smudging semantics is) before it gets written out to the working
tree.  "git diff A B" may give the attributes from tree A to the
preimage side while using the attributes from tree B to the
postimage side.

But the last example has some practical issues.  Very often, people
retroactively define attributes to correct earlier mistakes.  If an
older tree A forgot to declare that a path mybank.gnucash is a
GnuCash ledger file, while a newer tree B (and the current checkout
that is even newer) does [*1*], it is more useful to apply the newer
definition from .gitattributes to both trees in practice (and in
practice, you are much less likely to have a check-out of ancient
tree while running "git diff A B" to compare two trees that are
newer than the current check-out).  Using the file from the working
tree is the best approximation of "we want to use the newer one",
both from the semantics (i.e. you are likely to have fresher tree
checked out) and the performance (i.e. reading from files in the
working tree is far more trivial than reading from historical trees)
point of view.

So it is not so cut-and-dried that "take the attributes from the
same place" is a good and "correct" definition [*2*].

[Footnote]

*1* GnuCash writes, by default, a gzip compressed xml file, so I
have in my .gitattributes file

	*.gnucash	filter=gnucash

and then in my .git/config

	[filter "gnucash"]
        	clean = gzip -dc
                smudge = gzip -c

This allows "git diff" to work reasonably well (if you do not mind
reading diff between two versions of xml files, that is) and also
helps delta compression when packing the repository.

*2* Besides, the attributes are primarily used to define the
semantics about the contents in question.  If one file is of
"gnucash" kind (i.e. has "filter=gnucash" attribute in the previous
example) in one tree, and the path is of a different kind
(e.g. "filter=ooo" that says "this is an Ooo file"), it is very
likely that it does not even make sense, with or without content
filtering, to compare them with "git diff", so "take the attributes
from the same place" would have to imply "if the attributes do not
match, say something similar to 'Binary files differ'", which is
just as useless as applying one attribute taken from a convenient
but random place (i.e. the working tree).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html