Re: unifying sequencer's options persisting, was Re: [PATCH v2] sequencer: fix edit handling for cherry-pick and revert messages

Phillip Wood <phillip.wood123@xxxxxxxxx> · Fri, 2 Apr 2021 12:28:43 +0100

Hi Dscho and Elijah

On 31/03/2021 14:48, Johannes Schindelin wrote:
Hi,

On Tue, 30 Mar 2021, Elijah Newren wrote:

On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:

I'll allow myself one tangent: the subject of the sequencer's Unix
shell script heritage seems to come up with an increasing frequency,
in particular the awful "let's write out one file per setting"
strategy.

I would _love_ for `save_opts()` to write a JSON instead (or an INI
via the `git_config_*()` family of functions, as is done already by
the cherry-pick/revert stuff), now that we no longer have any shell
script backend (apart from `--preserve-merges`, but that one is on its
way out anyway).

The one thing that concerns me with this idea is that I know for a
fact that some enterprisey users play games with those files inside
`<GIT_DIR>/rebase-merge` that should be considered internal
implementation details. Not sure how to deprecate that properly, I
don't think we have a sane way to detect whether users rely on these
implementation details other than breaking their expectations, which
is not really a gentle way to ask them to update their scripts.

I think it depends if users are just reading the files or writing to 
them. If they are reading them (which my scripts do) then we could 
continue to write the important files along side the new single-file 
that git actually reads. I think there is a distinction between the 
files such as git-rebase-todo which hold state and the one-line files 
which hold the options passed by the user. For example I have scripts 
that read git-rebase-todo, rewritten-pending, rewritten-list, amend-head 
and check that author-script exists. If a script starts a rebase then it 
should know what options are in effect without reading them from 
.git/rebase-merge.

Ooh, I'm glad you took this tangent.  May I follow it for a second?
I'd really, really like this too, for three reasons:

1) I constantly get confused about the massive duplication and
difference in control structures and which ones affect which type of
operations in sequencer.c.  Having them both use an ini-file would
allow us to remove lots of that duplication.  I'm sure there'll still
be some rebase-specific or cherry-pick-specific options, but we don't
need a separate parallel structure for every piece of config.

2) In my merge-ort optimization work, rebasing 35 commits in linux.git
across a massive set of 26K upstream renames has dropped rename
detection time from the top spot.  And it also dropped several other
things in the merge machinery from their spots as well.  Do you know
what's the slowest now?  Wasted time from sequencer.c: the unnecessary
process forking and all the useless disk writing (which I suspect is
mostly updating the index and working directory but also writing all
the individual control files under .git/rebase-merge/).  And this
stuff from sequencer.c is not just barely the slowest part, it's the
slowest by a wide margin.

I think we do a lot of needless writing which is unrelated to whether we 
write to one file or may files. For example from memory picking a commit 
involves writing the 'message', 'author-script', 'rewritten-pending' 
(which is often immediately deleted), 'rewritten-list' (we append to 
that one) 'CHERRY_PICK_HEAD' (which is immediately deleted unless the 
pick has become empty), 'git-rebase-todo' is completely rewritten, and 
'done' is appended to. None of this is necessary. For rewording and 
merges the only thing that needs to be written is the commit message for 
the external process to use. Fixup and squash add a couple more files 
into the mix.

I think we would save a lot by only syncing the state to disk when we 
stop or run an exec command (the state needs to be synced so exec 
commands can alter the todo list). In those cases we need to write the 
index and possibly run an external process so writing a couple of files 
is probably insignificant.

Where I think we can usefully consolidate is the one-line files which 
store the options rather than state - these are read an written much 
less frequently so I don't think they have much of a performance hit but 
it would be much nicer to just serialize the options to a single file.

3) I also want to allow cherry-picking or rebasing branches that
aren't even checked out (assuming no conflicts are triggered;
otherwise an error can be shown with the user asked to repeat with the
operation connected to an active working directory).

Exciting!

 For such an
operation, the difference between "cherry-pick" and "rebase" is nearly
irrelevant so you'd expect the code to be the same; every time I look
at the code, though, it seems that the control structures are
separating these two types of operations in more areas than just the
reading and writing of the config.

Yes this can be confusing, for example rebase and cherry-pick handle the 
todo list differently. Rebase removes the command before trying to pick 
the commit and adds it back if the pick fails for a non-conflict reason, 
cherry-pick only removes the command if the pick is successful.

Best Wishes

Phillip

Excellent, we're in agreement, then.

The remaining question is: how do we want to go about it? Do we just want
to codify the notion that these are internal details that are nobody's
business, and if they are using the exact file system layout of the
current sequencer, then it's their responsibility to adapt?

Ciao,
Dscho