Re: RFD: fast-import is picky with author names (and maybe it should - but how much so?)

Felipe Contreras <felipe.contreras@xxxxxxxxx> · Sun, 11 Nov 2012 18:45:32 +0100

On Sun, Nov 11, 2012 at 6:15 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Sun, Nov 11, 2012 at 12:00:44PM -0500, A Large Angry SCM wrote:

> If there is a standard filter, then what is the advantage in doing it as
> a pipe? Why not just teach fast-import the same trick (and possibly make
> it optional)? That would be simpler, more efficient, and it would make
> it easier for remote helpers to turn it on (they use a command-line
> switch rather than setting up an extra process).

Right, but instead of a command-line switch it probably should be
enabled on the stream:

  feature clean-authors

Or something.

> But what I don't understand is: what would such a standard filter look
> like? Fast-import (or a filter) would already receive the exporter's
> best attempt at a git-like ident string.

Currently, yeah, because there's no other option. It's either try to
clean it up, or fail.

But if 'git fast-import' as a superior alternative, I certainly would
remove my custom code and enable that feature.

> We can clean up and normalize
> things like whitespace (and we probably should if we do not do so
> already). But beyond that, we have no context about the name; only the
> exporter has that.

There is no context.

> So if we receive:
>
>   Foo Bar<foo.bar@xxxxxxxxxxx> <none@none>
>
> or:
>
>   Foo Bar<foo.bar@xxxxxxxxxxx <none@none>
>
> or:
>
>   Foo Bar<foo.bar@xxxxxxxxxxx
>
> what do we do with it? Is the first part a malformed name/email pair,
> and the second part is crap added by a lazy exporter? Or does the
> exporter want to keep the angle brackets as part of the name field? Is
> there a malformed email in the last one, or no email at all?

These are exactly the same questions every exporter must answer. And
there's no answer, because the field is not a git author, it's a
mercurial user, or a bazaar committer, or who knows what.

>From whatever source, these all might be valid authors:
john
john <john@xxxxxxxxxxxx> (grease)
<test@xxxxxxxx>
test@xxxxxxxx
test<test@xxxxxxxx>
test <test@xxxxxxxx
test # a space
test < test@xxxxxxxx >
test >test@xxxxxxx>
test <test <at> test <dot> com>
<>
>
<
The first chapter of the LOTR

There is no context.

> The exporter is the only program that actually knows where the data came
> from,

It doesn't matter where it came from, it's not a name/email pair.

> how it should be broken down,

It cannot be broken down, it's free-form text. Any text.

> and what is appropriate for pulling
> data out of its particular source system.

This free-form text is the lowest granularity. There is nothing else.

> For that reason, the exporter
> has to be the place where we come up with a syntactically correct and
> unambiguous ident.

*If* the exporter is able to do this, sure, but many don't have any
more information.

See:

% hg commit -u 'Foo Bar<foo.bar@xxxxxxxxxxx> <none@none>' -m one
% hg --debug log
changeset:   0:5ef37a2c773f02d0e01f1ecdcc59149832d294e8
tag:         tip
phase:       draft
parent:      -1:0000000000000000000000000000000000000000
parent:      -1:0000000000000000000000000000000000000000
manifest:    0:c6d4cd25b9fc2f83b0dd51f4acbea9486fce54d7
user:        Foo Bar<foo.bar@xxxxxxxxxxx> <none@none>
date:        Sun Nov 11 18:33:00 2012 +0100
files+:      file
extra:       branch=default
description:
one

What is a hg exporter tool supposed to do with that?

What such a tool can do, 'git fast-import' can do.

> I am not opposed to adding a mailmap-like feature to fast-import to map
> identities, but it has to start with sane, unambiguous output from the
> exporter.

And if that's not possible?

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html