Re: [PATCH v3 13/25] sequencer: prepare for rebase -i's commit functionality

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Thu, 13 Oct 2016 12:50:16 +0200 (CEST)

Hi Junio,

On Wed, 12 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
> 
> >> Hmph, didn't we recently add parse_key_value_squoted() to build
> >> read_author_script() in builtin/am.c on top of it, so that this
> >> piece of code can also take advantage of and share the parser?
> >
> > I already pointed out that the author-script file may *not* be quoted.
> 
> I think my puzzlement comes from here.  What makes it OK for "am" to
> expect the contents of author-script file to be quoted but it is not
> OK to expect the same here?  What makes it not quoted for _this_
> reader, in other words?

The `git am` command is inherently *not* interactive, while the
interactive rebase, well, is.

As such, we must assume that enterprisey users did come up with scripts
that edit, or create, author-script files, and exploited the fact that the
interactive rebase previously sourced them.

Come to think of it, there is a bigger problem here, as users might have
abused the author-script to execute commands in rebase -i's own context.
Not sure we can do anything about that.

But the point stands, if anybody used unquoted, or differently quoted,
values in author-script, we should at least attempt within reason to
support that.

> I am not sure what you meant by "nominally related", but the purpose
> of the author-script in these two codepaths is the same, isn't it?

The purpose is, but the means aren't. As I pointed out above, the
interactive rebase is inherently much more interactive, and needs to be
much more forgiving in its input, than `git am`.

> Somebody leaves the author information from the source (either from
> an e-mailed patch or an existing commit), so that a later step can
> use that pieces of information left in the file when (re)creating a
> commit to record the tree made by using pieces of information from
> the source.
> 
> Are our use in the author-script in these two codepaths _already_
> inconsistent?  IOW, "am" never writes malformed unquoted values,
> while the sequencer writes out in a way that is randomly quoted or
> not quoted, iow, if you fed such an author-file to "am", it wouldn't
> understand it?

We heed Postel's Law: what the sequencer writes is in a very strict
format, but what the sequencer accepts need not be.

> I fully support your position to use different codepaths, if the
> file that has the same name and that is used for the same purpose
> uses different format in these two separate codepaths and the users
> already expect them to be different.  We obviously need to have two
> separate parsers.

Well, traditionally we *do* have separate parsers. I do not say that we
need to keep that, but given what I said above, it might not be a bad idea
to keep the lenient parser required by `git rebase -i` separate from the
one used by `git am` so that the latter can be faster (by making
assumptions the other parser cannot).

> But if that is not the case, IOW, if "am"'s author-script shares the
> same issue (i.e. "'am' initially writes the file properly quoted,
> but this or that can happen to change its quoting and we need to
> read from such a file"), then perhaps sharing needs to happen the
> other way around?  This patch may prepare "rebase -i" side for the
> "this or that" (I still do not know what they are) to allow the
> resulting file read correctly, but the same "this or that" can break
> what "am" has used and is in use there if that is the case, no?
> 
> What makes it OK for "am" to expect the contents of author-script
> file to be quoted but it is not OK to expect the same here?  What
> makes it not quoted for _this_ reader, and doesn't "am" share the
> same issue?

The fact that `git am` is *non-interactive*.

Ciao,
Dscho