Re: git smudge filter fails

Stephen Morton <stephen.c.morton@xxxxxxxxx> · Tue, 15 Mar 2016 12:17:16 -0400

On Thu, Mar 10, 2016 at 5:04 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
>> On Thu, Mar 10, 2016 at 09:45:19AM -0500, Stephen Morton wrote:
>>
>>> I am a bit confused because this is basically the example used in
>>> ProGit [1] and it is fundamentally broken. In fact, if I understand
>>> correctly, this means that smudge filters cannot be relied upon to
>>> provide any 'keyword expansion' type tasks because they will all by
>>> nature have to query the file with 'git log'.
>>
>> Interesting. Perhaps I am missing something (I am far from an expert in
>> clean/smudge filters, which I do not generally use myself), but the
>> example in ProGit looks kind of bogus to me. I don't think it ever would
>> have worked reliably, under any version of git.
>>
>>> (Note that although in my example I used 'git checkout', with an only
>>> slightly more complicated example I can make it fail on 'git pull'
>>> which is perhaps a much more realistic use case. That was probably
>>> implied in your answer, I just wanted to mention it.)
>>
>> Yeah, I think the issue is basically the same for several commands which
>> update the worktree and the HEAD. Most of them are going to do the
>> worktree first.
>
> You can have a pair of branches A and B that have forked long time
> ago, and have a path F that has been changed identically since they
> forked, perhaps by cherry-picking the same change.  This happens all
> the time.
>
> If there were some mechanism that modifies the checked out version
> of F with information that depends on the history that leads to A
> (e.g. "which commit that is an ancestor of A last modified F?")
> when you check out branch A, it will become invalid when you do "git
> checkout B", because the path F will not change because they are the
> same between the branches.  In short, CVS $Id$-style substitutions
> that depend on the history fundamentally does not work, unless you
> are willing to always rewrite the whole working tree every time you
> switch branches.
>
> The smudge and clean filters are given _only_ the blob contents and
> nothing else, not "which commit (or tree) the blob is taken from",
> not "which path this blob sits in that tree-ish", not "what branch
> am I on" and this is a very much deliberate design decision made in
> order to avoid leading people to a misguided attempt to mimick CVS
> $Id$-style substitutions.
>

I will raise an Issue with ProGit.

It's perhaps beyond the scope of my original question, but for
situations where I need a "last change date" embedded in a file (e.g.
because a protocol standard requires it), is there any recommended way
to do so? We've the hard way that hardcoding makes
merging/cherry-picking a bit of a nightmare and should be avoided. Is
a post-checkout hook the way to go? I've actually found the smudge
filter to be very slow for this application as each file is processed
in series; a post-commit hook that could operate on files in parallel
would likely be substantially faster.

Stephen

(Sorry about the earlier top-posting. I didn't realize what gmail was
doing until after it had happened.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html