Re: Comparing rebase --am with --interactive via p3400

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alban and Johannes

On 01/04/2020 12:33, Alban Gruin wrote:
Hi Johannes,

Sorry for the late answer, I was really busy for the last months.

Le 31/01/2020 à 22:23, Johannes Schindelin a écrit :
Hi Alban,
  -%<-

Indeed. I offered these insights in #git-devel (slightly edited):

This `discard_index()` is in an awfully central location. I am rather
certain that it would cause problems to just remove it.

Looking at `do_merge()`: it explicitly discards and re-reads the index if
we had to spawn a `git merge` process (which we do if a strategy other
than `recursive` was specified, or if it is an octopus merge). But I am
wary of other code paths that might not be as careful.

I see that `do_exec()` is similarly careful.


I have to admit that the index is not my area of expertise in git, so
sorry if my question is stupid: isn't there a less heavy way to find
unstaged or uncommitted changes than discarding and then reloading the
index?

One thing I cannot fail to notice: we do not re-read a changed index
after running the `prepare-commit-msg` hook, or for that matter, any other
hook. That could even be an old regression from the conversion of the
interactive rebase to using the sequencer rather than a shell script.

Further, `reset_merge()` seems to spawn `git reset --merge` without
bothering to re-read the possibly modified index. Its callees are
`rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`,
none of which seem to be careful, either, about checking whether the index
was modified in the meantime.

Technically, the in-memory index should also be discarded
in `apply_autostash()`, but so far, we do not have any callers of that
function, I don't think, that wants to do anything but release resources
and exit.

The `run_git_checkout()` function discards, as intended. I
am not quite sure whether it needs to, though, unless the `.git/index`
file _was_ modified (it _is_ possible, after all, to run `git rebase -i
HEAD`, and I do have a use case for that where one of my scripts generates
a todo script, sort of a `git cherry-pick --rebase-merges`, because
`cherry-pick` does not support that mode).

I'm not sure it is worth optimizing the case where .git/index is not changed as we only do this once per rebase. In any case I hope that one day we'll stop forking git checkout and use the code from builtin/rebase.c to do it

The `continue_single_pick()` function spawns a `git
commit` which could potentially modify the index through a hook, but the
first call site does not care and the second one guards against that
(erroring out...).

My biggest concern is with the `run_git_commit()` function: it does not
re-read a potentially-modified index (think of hooks).

I agree that we should be re-reading the index after forking `git commit` and also `git merge`. Most of the time we commit without forking so that should not impact the performance too much

Thank you for your analysis.


We will need to be very careful with this `discard_index()`, I think, and
in my opinion there is a great opportunity here for cleaning up a little:
rather than discarding and re-reading the in-memory index without seeing
whether the on-disk index has changed at all appears a bit wasteful to me.

This could be refactored into a function that only discards and re-reads
the index if the mtime of `.git/index` changed. That function could then
also be taught to detect when the in-memory index has unwritten changes:
that would constitute a bug.


Hmm, checking if the mtime of the index to see if it changed isn't racy?
  Sub-second changes should happen, and to quote a comment in
is_racy_stat(), "nanosecond timestamped files can also be racy" -- even
if it should not really happen in the case of rebase…

I don't think relying on the index stat data is a good idea, git defaults to one second mtime resolution unless it is built with -DUSE_NSEC and we do way more than one commit a second. We tried to rely on stat data to determine when to re-read the todo list after an exec and it is broken (both in the design because it assumes ns mtime resolution and the implementation because we don't update the cached mtime after we rewrite the todo list). There are not that many places where we need to re-read the index so I think we should just have explicit re-reads where we need them. Hopefully over time we'll stop forking other processes and the problem will go away.

Best Wishes

Phillip

Ciao,
Dscho


Alban




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux