Re: RFD: fast-import is picky with author names (and maybe it should - but how much so?)

Felipe Contreras <felipe.contreras@xxxxxxxxx> · Sat, 10 Nov 2012 19:43:18 +0100

On Sat, Nov 10, 2012 at 6:28 PM, Michael J Gruber
<git@xxxxxxxxxxxxxxxxxxxx> wrote:
> Felipe Contreras venit, vidit, dixit 09.11.2012 15:34:
>> On Fri, Nov 9, 2012 at 10:28 AM, Michael J Gruber
>> <git@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>>> Hg seems to store just anything in the author field ("committer"). The
>>> various interfaces that are floating around do some behind-the-back
>>> conversion to git format. The more conversions they do, the better they
>>> seem to work (no erroring out) but I'm wondering whether it's really a
>>> good thing, or whether we should encourage a more diligent approach
>>> which requires a user to map non-conforming author names wilfully.
>>
>> So you propose that when somebody does 'git clone hg::hg hg-git' the
>> thing should fail. I hope you don't think it's too unbecoming for me
>> to say that I disagree.
>
> There is no need to disagree with a proposal I haven't made. I would
> disagree with the proposal that I haven't made, too.

All right, we shouldn't encourage a more diligent approach which
requires a user to map author names then.

>> IMO it should be git fast-import the one that converts these bad
>> authors, not every single tool out there. Maybe throw a warning, but
>> that's all. Or maybe generate a list of bad authors ready to be filled
>> out. That way when a project is doing a real conversion, say, when
>> moving to git, they can run the conversion once and see which authors
>> are bad and not multiple times, each try taking longer than the next.
>
> As Jeff pointed out, git-fast-import expects output conforming to a
> certain standard, and that's not going to change. import is agnostic to
> where its import stream is coming from. Only the producer of that stream
> can have additional information about the provenience of the stream's
> data which may aid (possibly together with user input or choices) in
> transforming that into something conforming.

We already know where the import of those streams come from:
mercurial, bazaar, etc. There's absolutely nothing the tools exporting
data from those repositories can do, except try to convert all kind of
weird names--and many tools do it poorly.

So, the options are:

a) Leave the name conversion to the export tools, and when they miss
some weird corner case, like 'Author <email', let the user face the
consequences, perhaps after an hour of the process.

We know there are sources of data that don't have git-formatted author
names, so we know every tool out there must do this checking.

In addition to that, let the export tool decide what to do when one of
these bad names appear, which in many cases probably means do nothing,
so the user would not even see that such a bad name was there, which
might not be what they want.

b) Do the name conversion in fast-import itself, perhaps optionally,
so if a tool missed some weird corner case, the user does not have to
face the consequences.

The tool writers don't have to worry about this, so we would not have
tools out there doing a half-assed job of this.

And what happens when such bad names end up being consistent: warning,
a scaffold mapping of bad names, etc.

One is bad for the users, and the tools writers, only disadvantages,
the other is good for the users and the tools writers, only
advantages.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html