Re: Transparently encrypt repository contents with GPG

Michael J Gruber <git@xxxxxxxxxxxxxxxxxxxx> · Mon, 16 Mar 2009 17:01:33 +0100

Junio C Hamano venit, vidit, dixit 14.03.2009 19:45:
> Michael J Gruber <git@xxxxxxxxxxxxxxxxxxxx> writes:
> 
>> Since both the cleaned and the smudged version are supposed to be
>> "authoritative" (as opposed to the textconv'ed one) one may argue either
>> way what's the right approach.
> 
> Smudged one can never be authoritative.  That is the whole point of smudge
> filter and in general the whole convert_to_working_tree() infrastructure.
> It changes depending on who you are (e.g. on what platform you are on).
> So running comparison between two clean versions is the only sane thing to
> do.

Yes. I guess I'm being too much of a mathematician here: if clean is a
well-defined function, then clean(x) is well defined by specifying x. In
that sense x is equally authoritative.
Again, if smudge is the inverse of clean, i.e. smudge and clean are
bijective, then x differs from y iff clean(x) differs from clean(y).

> You could argue textconv should work on smudged contents or on clean
> contents before smudging.  As long as it is done consistently, I do not
> care either way too deeply, as its output is not supposed to be used for
> anything but human consumption.  Two equally sane arrangement would be:
> 
>  (1) Start from two clean contents (run convert_to_git() if contents were
>      obtained from the work tree), run textconv, run diff, and output the
>      result literally; or
> 
>  (2) Start from two smudged contents (run convert_to_working_tree() for
>      contents taken from the repository), run textconv, run diff, and
>      run clean before sending the result to the output.
> 
> The former assumes a textconv filter that wants to work on clean
> contents, the latter for a one that expects smudged input.  I probably
> would suggest going the former approach, as it is consistent with the
> general principle in other parts of the system (the internal processing
> happens on clean contents).
> 
> Both of the above two assumes that the output should come in clean form;
> it is consistent with the way normal diff is generated for consumption by
> git-apply. You can certainly argue that the final output should be in
> smudged form when textconv is used, as it is purely for human consumption,
> and is not even supposed to be fed to apply.

Also, I don't expect clean to be necessarily meaningful when applied to
the result of textconv, and even less so to the output of diff.

Now, a simple test shows that git diff obviously does this when diffing
HEAD to worktree:

diff between HEAD and clean(worktree)

Which is the right thing. It just seems so that textconv is not even
called "in the wrong place of the chain", but messes the diff up in this
way:

diff between textconv(HEAD) and textconv(worktree)

(I expected clean(textconv(worktree)) first, which would be wrong, too).
I.e., the clean filter is ignored completely in the presence of textconv.

OK, I'll stop bugging you, until I checked the existing tests and the
code...

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html