Re: [PATCH] sequencer: honor GIT_REFLOG_ACTION

Phillip Wood <phillip.wood123@xxxxxxxxx> · Thu, 2 Apr 2020 10:39:34 +0100

On 02/04/2020 06:15, Elijah Newren wrote:
On Wed, Apr 1, 2020 at 4:29 PM Ian Jackson
<ijackson@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Hi.  Thanks for looking at this.

Elijah Newren via GitGitGadget writes ("[PATCH] sequencer: honor GIT_REFLOG_ACTION"):
     I'm not the best with getenv/setenv. The xstrdup() wrapping is
     apparently necessary on mac and bsd. The xstrdup seems like it leaves us
     with a memory leak, but since setenv(3) says to not alter or free it, I
     think it's right. Anyone have any alternative suggestions?

I can try to help.  It's not entirely trivial.

The setenv interface is a wrapper around putenv.  putenv has had a
variety of different semantics.  Some of these sets of semantics
cannot be used to re-set the same environment variable without a
memory leak - and even figuring out what semantics you have would be
complex and tend to produce code which would fail in bad ways.
There's a short summary of the situation in Linux's putenv(3).

Would it be possible for git to arrange to set GIT_REFLOG_ACTION only
when it is invoking subprocesses ? Otherwise it would update, and >> look at, a global variable of its own.  (Or a parameter to relevant
functions if one doesn't like the action-at-a-distance effect of a
global.)

And, it seems to me that the reflog handling should be centralised.

+     char *reflog_action = getenv("GIT_REFLOG_ACTION");

       va_start(ap, fmt);
       strbuf_reset(&buf);
-     strbuf_addstr(&buf, action_name(opts));
+     strbuf_addstr(&buf, reflog_action ? reflog_action : action_name(opts));

Open coding this kind of thing at every site which needs to think
about the reflog actions will surely result in some of the instances
having bugs.

Writing a single function that contans this (or most of it) would
happily decouple all of its call sites from literally asking about
getenv("GIT_REFLOG_ACTION") thereby making it easier to do the
indirection-through-program-variables I suggest.

That sounds great, but I'm not sure that "only when invoking
subprocesses" will limit the places where we set the environment
variable all that much; it might actually expand it.

Off hand I think we'd need to change run_git_checkout(), 
run_git_commit() and do_merge() to set the environment variable and fix 
try_to_commit() to use a proper variable for the reflog message.

 I wasn't there
for the whole history, but my understanding is the rebase code has
slowly transformed from the original all-shell rebase
implementation(s), to being a helper program that the shell could call
into for parts of its operations and passing control back and forth
between shell and C, to being a reimplementation of just invoking the
same commands that the shell script would have, to slowly transforming
into an actual library where invocations of other git subprocesses are
being replaced with relevant function calls.  It's a long cleanup
process that is still ongoing.  I'd like to get to the point where we
only invoke subprocesses if the user specifies --exec or a special
merge strategy, but that's a goal with a longer term timeframe than
fixing a 2.26 regression.

Having said that,

diff --git a/t/t3406-rebase-message.sh b/t/t3406-rebase-message.sh
index 61b76f33019..927a4f4a4e4 100755
--- a/t/t3406-rebase-message.sh
+++ b/t/t3406-rebase-message.sh

This test case convinces me that the patch has the right behaviour for
at least the case I care about :-).

Cool, sounds like it's a good immediate fix for the 2.26 regression,
and then longer term as we continue refactoring we can hopefully
isolate subprocess handling and writing of state.

As a heads up, though, my personal plans for rebase (subject to buy-in
from other stakeholders) is to make it do a lot more in-memory work.
In particular, this means for common cases there will be no subprocess
invocations, no writing of any state unless/until you hit a conflict,
no updating of any files in the working tree until all commits have
been created (or a conflict is hit), 

I'm with you up to here - it sounds fantastic

and no updating of the branch
until after all the commits have been created.

We only update the branch reflog at the end of the rebase now.

Thus, for the common
cases with no conflicts, there would only be 1 entry in the reflog of
HEAD the entire operation, rather than approximately 1 per commit. 

This I'm not so sure about. In the past where I've messed up a rebase 
and not noticed until after a subsequent rebase it has been really 
useful to be able to go through HEAD's reflog and find out where exactly 
I messed up by looking at the sequence of picks for the first rebase. 
Specifically it shows which commits where squashed together which you 
cannot get by running git log on the result of the rebase.

I
have a proof-of-concept showing these ideas work for basic cases. 

Sounds exciting

Best Wishes

Phillip

 So,
I hope your tests don't depend on the number of entries added to
HEAD's reflog.