Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual

Junio C Hamano <junkio@xxxxxxx> · Tue, 15 May 2007 11:41:01 -0700

Jeff King <peff@xxxxxxxx> writes:

> Unfortunately, I don't think we have the encoding information any more
> at that point. We can infer how the patch was generated by looking at
> the git-config, and that should be right 99% of the time (unless the
> patches were generated with a different config, either from another repo
> or before some settings were changed).
>
> Junio, can you confirm my understanding that:
>   - if i18n.logOutputEncoding is set, then we are definitely in that
>     encoding
>   - otherwise, if i18n.commitEncoding is set, we should assume commits are
>     in that encoding (which is just a guess, since they may have been
>     generated on another config, but it's our best guess)
>   - otherwise, assume utf-8

I do not want to break projects whose members consistently use a
single non UTF-8 encoding, and I've been hoping that in such a
use case they should not have to set any of these encoding
configuration.  So in that sense I would be somewhat reluctant
to agree with the last one.  But I am getting a feeling that it
is a losing battle.

On the patch acceptance side, when we do _not_ have encoding
information and the input does not look like a valid UTF-8, we
assume that the input is latin-1 and convert it to UTF-8, if I
recall correctly.  If somebody sent you a patch without encoding
header, and then you are forwarding that patch, not adding
anything ourselves (because we do not know) and let the
receiving end to do that conversion is certainly the best; but
if we _were_ to add anything I would suspect it would be a
better idea to use the same logic to default to latin-1 or
UTF-8.  East Asian users may want to raise objections here.

I think it is a reasonable compromise to do it the way you
outlined.  Doing it at patch generation time would fix the
ambiguity issues during the step 2, so it might turn out to be
necessary to add the encoding header to format-patch output
after all, but send-email needs to be able to handle messages
that do not have the header anyway, so probably the first step
is to do so in send-email.

When we update format-patch, the ambiguity at step 2 would
disappear.  My gut feeling is that adding an extra header to
format-patch output would not break people's workflow nor
scripts (I do not think it would break mine, as I either suck in
only the body of the message to my MUA or use send-email), but I
am not sure.

> Also Junio, it looks like commit 7cbcf4d5 moved parsing of the
> --encoding parameter into setup_revisions, but it's still being checked
> for in cmd_log_init. Can you confirm that the latter is now superfluous
> and can be removed?

Thanks for noticing, and I think you are right.  The code parses
the same input and sets the same global variable the same way.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html