Re: Possible regression in git-rev-list --header

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Tue, 2 Jan 2007 22:32:24 +0100 (CET)

Hi,

On Sun, 31 Dec 2006, Junio C Hamano wrote:

> "Marco Costalba" <mcostalba@xxxxxxxxx> writes:
> 
> > On 12/31/06, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote:
> >>
> >> Further, if you rely on parsing being super-fast, why not just parse
> >> _only_ the header information that you actually need? The header still
> >> consists of
> >>
> >>         - exactly one "tree",
> >>         - an arbitrary amount of "parent" lines,
> >>         - exactly one "author", and
> >>         - exactly one "committer" line
> >>
> >> After that may come optional headers,
> 
> They are more like 'other' headers.

I should have been more clear: optional for the committer.

> > If you intorduce the concept of an 'optional header part' you
> > logically and naturally _may_ also introduce the concept of disabling
> > the display of _that_ optional header, or better, to keep back
> > compatibility...
> 
> While I am somewhat sympathetic, and am willing to apologize for
> trying to advance the i18n support without enough advance
> warning, I think you already know what you are saying does not
> make much sense in the larger picture and as the longer term
> solution.

Besides, when you say

> The problem with your proposed algorithm is that you don't have _one_ 
> commit but a sequence of commits to parse, so when you have parsed until 
> the committer line you must need to know where the next commit starts, 
> IOW you have to find the next '\0', that's what I was trying to expose 
> in my previous e-mail postscriptum.

I don't get _at all_ what could be a problem _with_ the encoding header 
that is no problem _without_ it. I assume you want to tell me something 
more than that you do not want to change your code? If so, I missed it.

>  * When the output encoding conversion is done successfully, the
>    current tip of master drops "encoding" header from the
>    output, [...]

Earlier, I said that I do not feel strongly about that issue.

But now I do.

If you drop the "encoding" header from the commit buffer, just because you 
reencoded it to whatever encoding happens to be the one the caller just 
asked for, you are _not_ interpreting the data, but _changing_ it.

That is not what git is about, IMHO. It would be a completely different 
thing if the caller had a way to ask for _specific_ headers, and asks to 
be left alone with all the other cruft. But the caller does not even have 
the chance to say that, let alone ask specifically _for_ it.

The encoding header bears information, just like the tree header or the 
committer header. I find it highly irritating that I am shielded from it. 
The encoding header has _nothing_ to do with the encoding that the output 
is being encoded with, but _all_ with how the commit message was encoded 
_by the committer_.

> The reason we did the latter, by the way, does not have anything
> to do with helping broken parsers.  We drop the header after
> re-coding the log message into an encoding specified by the user
> (which is presumably different from what the commit was
> originally recorded in) because the encoding recorded on
> "encoding" header would not match the re-coded log message
> anymore.

By the same reasoning, you'd have to rewrite the committer line to reflect 
the current GIT_COMMITTER_IDENT, or hide it. If you want to convince me, 
you have to try harder.

And Marco has to fix the header parsing anyway.

So, please, Junio, can you rethink that decision?

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html