Re: [PATCH 2/2] Add keyword unexpansion support to convert.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Apr 2007, Junio C Hamano wrote:

Nicolas Pitre <nico@xxxxxxx> writes:

I would like to, however this doesn't currently integrate
well with git. I've been told in the past that once
.gitattributes is in place then the hooks for the crlf stuff
can be generalized to allow for calls out to custom code to
do this sort of thing.

And I agree that this is a perfectly sensible thing to do.  The facility
should be there for you to apply any kind of transformation with
external tools on data going in or out from Git.  There are good and bad
things you can do with such a facility, but at least it becomes your
responsibility to screw^H^H^H^Hfilter your data and not something that
is enforced by Git itself.

You have to be careful, though.  Depending on what kind of
transformation you implement with the external tools, you would
end up having to slow down everything we would do.

you can slow down everything that you do on your system if you defined too much work for external tools, that won't slow other people down who don't define any work for external tools.

It boils down to this statement from Andy:

   ..., keywords (in other VCSs, and so why not in git) are
   only updated when a file is checked out.  There is no need
   to touch every file.  It's actually beneficial, because the
   keyword in the file is the state of the file at the time it
   was checked in - which is actually more useful than updating
   it to the latest commit every time.

   That means you're only ever expanding in a file that your
   changing anyway - so it's effectively free.  git-checkout
   would still be immediate and instantaneous.

Back up a bit and think what "when a file is checked out" means.
His argument assumes the current behaviour of not checking out
when the underlying blob objects before munging are the same.

correct.

But with keyword expansion and fancier "external tools" whose
semantics are not well defined (iow, defined to be "do whatever
they please"), does it still make sense to consider two blobs
that appear in totally different context "the same" and omit
checking out (and causing the external tools hook not getting
run)?  I already pointed out to Andy that the branch name the
file was taken from, if it were to take part of the keyword
expansion, would come out incorrectly in his printed svg
drawing.

this is part of the rope you are handing out. the external tool could do a lot of things that don't make sense. you could have the tool include the serial number of the cpu you happen to be running on at the moment, it wouldn't make sense to do this, but it could be done. the fact that the rope could be used to hang someone doesn't mean that you should outlaw rope.

If you want somebody's earlier example of "giving a file with
embedded keyword to somebody, who modifies and sends the result
back in full, now you would want to incorporate the change by
identifying the origin" to work, you would want "$Source$" (I am
looking at CVS documentation, "Keyword substitution/Keyword
List") to identify where that file came from (after all, a
source tree could have duplicated files) so that you can tell
which file the update is about, and this keyword would expand
differently depending on where in the project tree the blob
appears.

It is not just the checkout codepath.  We omit diffs when we
know from SHA-1 that the blobs are the same before decoration.
We even omit diffs when we know from SHA-1 that two trees are
the same without taking possible decorations that can be applied
differently to the blobs they contain into account.  Earlier,
Andy said he wanted to grep for the expanded text if he is
grepping in the working tree, and I think that makes sense, but
that means git-grep cannot do the same "borrow from working tree
when expanding from blob object is more expensive" optimization
we have for diff.  We also need to disable that optimization
from the diff, regardless of what the correct semantics for
grepping in working trees should be.

git would not be able to borrow from the working tree just becouse the index thinks that the file is the same (and frankly, I'm not sure this is really a safe thing to do in any case, it's just something that works frequently enough that we get away with it)

the diff optimizations could (and should) stay.

I suspect that you would have to play safe and say "when
external tools are involved, we need to disable the existing
content SHA-1 based optimization for all paths that ask for
them" to keep your sanity.

Andy and I are both expecting that if the blobs are the same that none of the git tools would flag them as different. this maintains the huge speedups that git achieves by doing these checks.

if you want to make an option somewhere that disables this optimization, I guess it would be Ok, but I wouldn't do so until someone came up with a situation where they really needed it, nothing in what Andy or I have asked for needs this.

both of us are treating the keyword expansion as decorations to the file. it's useful, but the core meaning of 'what this file is' is the checked in version with the keywords unexpanded. all the optmizations that only look at what's checked in will remain as valid as they are today, it's only things that look at your working directory that would change.

David Lang
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]