RCS Keywords in Git done right

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio, et al.,

I've completed my first pass at RCS Keywords in Git. I believe I've
come up with a solution that is accurate, performant and complete (but
I have not tested it on big repos yet, I'm doing that today...).

https://github.com/derekm/git-keywords

This work basically takes advantage of all the state-machine
transitions in git to surgically perform "git update-index $(git
archive $(git log -1 --format=%H @ -- $path) -- $path | tar vx)"
overwrites in the work tree. (It also exposes some state transitions
that are entirely absent from git, creating a few edge cases, but they
are relatively unimportant edge cases if your deployed git repos will
be managed by an automated system [humans doing development workflows
can trigger the edge cases when cancelling certain operations, all
edge cases just leave you with un-substituted files, which will become
substituted again after checkouts, commits, merges, rewrites, etc.].)

Only $Author$, $Date$ and $Revision$ can be emulated presently. $Id$
and other tags requiring filename paths or basenames are possible, but
would require changes internal to git allowing "pretty format" codes
inside a file to triangulate filenames from blob hash and commit hash
pairs.

I believe this work fundamentally proves that the theory of RCS
keywords is sound in the context of Git, and that full support in
git-core is entirely achievable in short order. In fact, other areas
in git would become improved for several reasons if git devs ingested
some of the results of this work.

There is a lot of gainsaying and kneejerk reaction to the idea of
keywords under the assumption of distributed development because of
the fallacy of thinking in terms of shared/universal linear history
instead of in terms of relative spacetime events.

Keyword substitution can be done accurately relative to the history of
the possessor of that history. Last edit timestamps and last authors
and revision IDs are important to many workflows inside and outside
development.

Of the keywords emulated, the only thing I couldn't achieve
(obviously) were monotonically increasing revision numbers, instead I
went with the file's most recent commit short hash (which is more
proper for git anyway).

To test it out...

1) clone the repo:

git clone https://github.com/derekm/git-keywords

2) cd into the repo and setup the hooks:

ln -sf ../../post-checkout-filter.pl .git/hooks/post-checkout
ln -sf ../../pre-commit-check.pl .git/hooks/pre-commit
ln -sf ../../post-commit-filter.pl .git/hooks/post-commit
ln -sf ../../post-merge-filter.pl .git/hooks/post-merge
ln -sf ../../post-rewrite-filter.pl .git/hooks/post-rewrite

3) edit .git/config and setup the filters:

[filter "keywords"]
        smudge = ./keyword-smudge.pl %f
        clean = ./keyword-clean.pl

4) inspect the lack of substitutions:

head -4 *

5) initialize the repo with first substitutions:

for i in $(git ls-tree --name-only @); do
 git update-index \
  $(git archive \
   $(git log -1 --format=%H @ -- $i) -- $i | tar vx)
done

6) inspect the presence of substitutions:

head -4 *

7) ??? (start hacking, try to break it, etc.)

8) PROFIT!

PS: I may consider rewriting the hooks in Bash, but I need to audit
what commands are available under msys-git.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]