Re: [PATCH 2/2] Add keyword unexpansion support to convert.c

Rogan Dawes <lists@xxxxxxxxxxxx> · Wed, 18 Apr 2007 17:38:54 +0200

Linus Torvalds wrote:

On Wed, 18 Apr 2007, Rogan Dawes wrote:
Or similarly, when checking an "ODF" file in, the attribute would lead to an
appropriate script creating the "tree" of individual files.

Does this sound workable?

I think it sounds very interesting, and I'd much rather do _those_ kinds 
of rewrites than keyword unexpansion. And yes, some kind of generic 
support for rewriting might give people effectively the keywords they want 
(I think the CVS semantics are not likely to be logical, but people can 
probably do something that works for them), and at that point maybe the 
keyword discussion goes away too.

However, I don't know if it is "workable".

The thing is, it's easy enough (although potentially _very_ expensive) to 
run some per-file script at each commit and at each checkout. But there 
are some fundamental operations that are even more common:

 - checking for "file changed", aka the "git status" kind of thing

   Anything we do would have to follow the same "stat" rules, at a 
   minimum. You can *not* afford to have to check the file manually.

   So especially if you combine several pieces into one, or split one file 
   into several pieces, your index would have to contain the entry 
   that matches the _filesystem_ (because that's what the index is all 
   about), but then the *tree* would contain the pieces (or the single 
   entry that matches several filesystem entries).

Right. I would imagine that the script would have to take care of 
setting timestamps in the filesystem appropriately, as well as passing 
them back to git when queried.

e.g. expanding test.odf/: (since we store it as a directory)

git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"

odf checkout calls back into git to find out the details of the files 
under test.odf/, and creates a zip file containing the individual files, 
with appropriate timestamps.

User then opens the file using OO.o or whatever, makes some changes and 
saves the file.

The user then runs git status:

git calls "odf.sh stat test.odf/" (again, triggered by an attribute)

odf.sh does the equivalent of "zip -l" to get up to date stat info for 
the component files, and passes it back to git (via stdout?)

User commits his changes:

git calls "odf.sh checkin test.odf/"

odf.sh unpacks the individual files, calls back into git to create 
individual objects (using a fast-import-alike protocol over stdout?)

 - what about diffs (once the stat information says something has 
   potentially changed)? You'd have to script those too, and it really 
   sounds like some very basic operations get a _lot_ more expensive and 
   complex.
>
   This is also related to the above: one of the most fundamental diffs is 
   the diff of the index and a tree - so if the index matches the 
   "filesystem state" and the trees contain some "combined entry" or 
   "split entry", you'd have to teach some very core diff functionality 
   about that kind of mapping.

In other words, I think it's too complicated. Not necessarily impossible, 
but likely harder and more complex than it's really worth.

Having a 1:1 file mapping (like the CRLF<->LF object mapping is) is a lot 
easier. You just have to make sure that the index has the *stat* 
information from the filesystem, but the *sha1* identity information from 
the git internal format, and things automatically just fall out right. But 
if you have anything but a 1:1 relationship, it gets hugely more complex.

			Linus

Absolutely. I just raised it now since it was originally mentioned quite 
a long time ago as a possible feature of git, and I couldn't see how it 
might work.

Thanks for your time,

Rogan
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html