Re: [PATCH] Use GIT_COMMITTER_IDENT instead of hardcoded values in import-tars.perl

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 08 Sep 2008 13:40:20 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Sun, 7 Sep 2008, Mike Hommey wrote:
>
>> -my $committer_name = 'T Ar Creator';
>> -my $committer_email = 'tar@xxxxxxxxxxx';
>> +chomp(my $committer_ident = `git var GIT_COMMITTER_IDENT`);
>> +die 'You need to set user name and email'
>> +	unless ($committer_ident =~ s/(.+ <[^>]+>).*/\1/);
>
> I have at least one script that will be broken by this change in behavior.
>
> To me, the issue is just like git-cvsimport, which sets the committer not 
> to the actual committer, so that two people can end up with identical 
> commit names, even if they cvsimported independently.  I'd like the same 
> behavior for import-tars.  I actually use it that way.

I sense there are conflicting goals here.

cvsimport has partial information about the author (only short account
name and nothing else), and by replicating them without taking them
literally you can achieve reproducibility.  On the other extreme is to use
the authorname mapping file to sacrifice reproducibility with other people
that do not have the identical author mapping file to obtain more readable
resulting history with real names.  You can do both.

With the hardcoded 'T Ar Creator', you do not have any choice but strict
reproducibility without readable names.  With Mike's original patch to
make it in line with git-import.{sh,perl}, you cannot still have both,
because setting GIT_COMMITTER_NAME does not affect what user.name
configuration says.  But with "git var GIT_COMMITTER_IDENT", you could.

This makes me wonder if it might be a better design to:

 * Make fast-import feeders to preserve as much information from the
   source material but not from the environment.  This is half-similar in
   spirit to what cvsimport does---it does not know the timezone so it
   always uses GMT, and it uses the short account name because it is the
   only thing available, but it does not use hardcoded "cvs", and the
   environment can affect it further by setting up an author mapping
   file.  Here I am saying a fast-import feeder shouldn't (and does not
   have to) take the environment into account, if it does not have good
   data in the source material.

   In the context of importing tarballs, zipfiles and an existing directory
   which is a tarball extract, there is not much authorship information in
   the source material (each entry in a tarball may have the owner
   information, but what if your tarball have more than one files, with
   different owners?).

 * Invent a fast-import stream filter that allows you to munge authorship
   and committer information selectively.  Splice that in to the pipeline
   between the feeder and the fast-import, if you want the resulting
   history more readable if desired (e.g. use author mapping file).

   Or you can choose not to use such a filter, and get a reproducible
   result.

If the "filter" turns out to be simple enough, it might even make sense to
make it part of the fast-import itself, but that is an implementation
detail.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html