Hi Phillip, On Wed, 1 Apr 2020, Phillip Wood wrote: > On 01/04/2020 12:33, Alban Gruin wrote: > > Hi Johannes, > > > > Sorry for the late answer, I was really busy for the last months. > > > > Le 31/01/2020 à 22:23, Johannes Schindelin a écrit : > > > Hi Alban, > > > -%<- > > > > > > Indeed. I offered these insights in #git-devel (slightly edited): > > > > > > This `discard_index()` is in an awfully central location. I am rather > > > certain that it would cause problems to just remove it. > > > > > > Looking at `do_merge()`: it explicitly discards and re-reads the index if > > > we had to spawn a `git merge` process (which we do if a strategy other > > > than `recursive` was specified, or if it is an octopus merge). But I am > > > wary of other code paths that might not be as careful. > > > > > > I see that `do_exec()` is similarly careful. > > > > > > > I have to admit that the index is not my area of expertise in git, so > > sorry if my question is stupid: isn't there a less heavy way to find > > unstaged or uncommitted changes than discarding and then reloading the > > index? > > > > > One thing I cannot fail to notice: we do not re-read a changed index > > > after running the `prepare-commit-msg` hook, or for that matter, any other > > > hook. That could even be an old regression from the conversion of the > > > interactive rebase to using the sequencer rather than a shell script. > > > > > > Further, `reset_merge()` seems to spawn `git reset --merge` without > > > bothering to re-read the possibly modified index. Its callees are > > > `rollback_single_pick()`, `skip_single_pick()` and `sequencer_rollback()`, > > > none of which seem to be careful, either, about checking whether the index > > > was modified in the meantime. > > > > > > Technically, the in-memory index should also be discarded > > > in `apply_autostash()`, but so far, we do not have any callers of that > > > function, I don't think, that wants to do anything but release resources > > > and exit. > > > > > > The `run_git_checkout()` function discards, as intended. I > > > am not quite sure whether it needs to, though, unless the `.git/index` > > > file _was_ modified (it _is_ possible, after all, to run `git rebase -i > > > HEAD`, and I do have a use case for that where one of my scripts generates > > > a todo script, sort of a `git cherry-pick --rebase-merges`, because > > > `cherry-pick` does not support that mode). > > I'm not sure it is worth optimizing the case where .git/index is not changed > as we only do this once per rebase. In any case I hope that one day we'll stop > forking git checkout and use the code from builtin/rebase.c to do it > > > > The `continue_single_pick()` function spawns a `git > > > commit` which could potentially modify the index through a hook, but the > > > first call site does not care and the second one guards against that > > > (erroring out...). > > > > > > My biggest concern is with the `run_git_commit()` function: it does not > > > re-read a potentially-modified index (think of hooks). > > I agree that we should be re-reading the index after forking `git commit` and > also `git merge`. Most of the time we commit without forking so that should > not impact the performance too much > > > Thank you for your analysis. > > > > > > > > We will need to be very careful with this `discard_index()`, I think, and > > > in my opinion there is a great opportunity here for cleaning up a little: > > > rather than discarding and re-reading the in-memory index without seeing > > > whether the on-disk index has changed at all appears a bit wasteful to me. > > > > > > This could be refactored into a function that only discards and re-reads > > > the index if the mtime of `.git/index` changed. That function could then > > > also be taught to detect when the in-memory index has unwritten changes: > > > that would constitute a bug. > > > > > > > Hmm, checking if the mtime of the index to see if it changed isn't racy? > > Sub-second changes should happen, and to quote a comment in > > is_racy_stat(), "nanosecond timestamped files can also be racy" -- even > > if it should not really happen in the case of rebase… > > I don't think relying on the index stat data is a good idea, git defaults to > one second mtime resolution unless it is built with -DUSE_NSEC and we do way > more than one commit a second. We tried to rely on stat data to determine when > to re-read the todo list after an exec and it is broken (both in the design > because it assumes ns mtime resolution and the implementation because we don't > update the cached mtime after we rewrite the todo list). There are not that > many places where we need to re-read the index so I think we should just have > explicit re-reads where we need them. Hopefully over time we'll stop forking > other processes and the problem will go away. Well. Even the 1-second granularity should buy us some performance if we assume that `same mtime` == `racy`. That should still catch the majority of the cases where the index was simply not changed, at least in the `do_exec()` case. Ciao, Dscho > > Best Wishes > > Phillip > > > > Ciao, > > > Dscho > > > > > > > Alban > > >