Re: unifying sequencer's options persisting, was Re: [PATCH v2] sequencer: fix edit handling for cherry-pick and revert messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dscho and Elijah

On 31/03/2021 14:48, Johannes Schindelin wrote:
Hi,

On Tue, 30 Mar 2021, Elijah Newren wrote:

On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:

I'll allow myself one tangent: the subject of the sequencer's Unix
shell script heritage seems to come up with an increasing frequency,
in particular the awful "let's write out one file per setting"
strategy.

I would _love_ for `save_opts()` to write a JSON instead (or an INI
via the `git_config_*()` family of functions, as is done already by
the cherry-pick/revert stuff), now that we no longer have any shell
script backend (apart from `--preserve-merges`, but that one is on its
way out anyway).

The one thing that concerns me with this idea is that I know for a
fact that some enterprisey users play games with those files inside
`<GIT_DIR>/rebase-merge` that should be considered internal
implementation details. Not sure how to deprecate that properly, I
don't think we have a sane way to detect whether users rely on these
implementation details other than breaking their expectations, which
is not really a gentle way to ask them to update their scripts.

I think it depends if users are just reading the files or writing to them. If they are reading them (which my scripts do) then we could continue to write the important files along side the new single-file that git actually reads. I think there is a distinction between the files such as git-rebase-todo which hold state and the one-line files which hold the options passed by the user. For example I have scripts that read git-rebase-todo, rewritten-pending, rewritten-list, amend-head and check that author-script exists. If a script starts a rebase then it should know what options are in effect without reading them from .git/rebase-merge.

Ooh, I'm glad you took this tangent.  May I follow it for a second?
I'd really, really like this too, for three reasons:

1) I constantly get confused about the massive duplication and
difference in control structures and which ones affect which type of
operations in sequencer.c.  Having them both use an ini-file would
allow us to remove lots of that duplication.  I'm sure there'll still
be some rebase-specific or cherry-pick-specific options, but we don't
need a separate parallel structure for every piece of config.

2) In my merge-ort optimization work, rebasing 35 commits in linux.git
across a massive set of 26K upstream renames has dropped rename
detection time from the top spot.  And it also dropped several other
things in the merge machinery from their spots as well.  Do you know
what's the slowest now?  Wasted time from sequencer.c: the unnecessary
process forking and all the useless disk writing (which I suspect is
mostly updating the index and working directory but also writing all
the individual control files under .git/rebase-merge/).  And this
stuff from sequencer.c is not just barely the slowest part, it's the
slowest by a wide margin.

I think we do a lot of needless writing which is unrelated to whether we write to one file or may files. For example from memory picking a commit involves writing the 'message', 'author-script', 'rewritten-pending' (which is often immediately deleted), 'rewritten-list' (we append to that one) 'CHERRY_PICK_HEAD' (which is immediately deleted unless the pick has become empty), 'git-rebase-todo' is completely rewritten, and 'done' is appended to. None of this is necessary. For rewording and merges the only thing that needs to be written is the commit message for the external process to use. Fixup and squash add a couple more files into the mix.

I think we would save a lot by only syncing the state to disk when we stop or run an exec command (the state needs to be synced so exec commands can alter the todo list). In those cases we need to write the index and possibly run an external process so writing a couple of files is probably insignificant.

Where I think we can usefully consolidate is the one-line files which store the options rather than state - these are read an written much less frequently so I don't think they have much of a performance hit but it would be much nicer to just serialize the options to a single file.


3) I also want to allow cherry-picking or rebasing branches that
aren't even checked out (assuming no conflicts are triggered;
otherwise an error can be shown with the user asked to repeat with the
operation connected to an active working directory).

Exciting!

 For such an
operation, the difference between "cherry-pick" and "rebase" is nearly
irrelevant so you'd expect the code to be the same; every time I look
at the code, though, it seems that the control structures are
separating these two types of operations in more areas than just the
reading and writing of the config.

Yes this can be confusing, for example rebase and cherry-pick handle the todo list differently. Rebase removes the command before trying to pick the commit and adds it back if the pick fails for a non-conflict reason, cherry-pick only removes the command if the pick is successful.

Best Wishes

Phillip

Excellent, we're in agreement, then.

The remaining question is: how do we want to go about it? Do we just want
to codify the notion that these are internal details that are nobody's
business, and if they are using the exact file system layout of the
current sequencer, then it's their responsibility to adapt?

Ciao,
Dscho





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux