On 25/11/2019 13:18, SZEDER Gábor wrote:
On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
To notice a changed todo file the sequencer stores the file's stat
data in its 'struct todo_list' instance, and compares it with the
file's current stat data after 'reword', 'squash' and 'exec'
instructions. If the two stat data doesn't match, it re-reads the
todo file.
Sounds simple, but there are some subtleties going on here:
- The 'struct todo_list' holds the stat data from the time when the
todo file was last read.
- This stat data in 'struct todo_list' is not updated when the
sequencer itself writes the todo file.
- Before executing each instruction during an interactive rebase,
the sequencer always updates the todo file by removing the
just-about-to-be-executed instruction. This changes the file's
size and inode [1].
Consequently, when the sequencer looks at the stat data after a
'reword', 'squash' or 'exec' instruction, it most likely finds that
they differ, even when the user didn't modify the todo list at all!
This is not an issue in practice, it just wastes a few cycles on
re-reading the todo list that matches what the sequencer already has
in memory anyway.
It can be much more than just a few cycles, because the total number
of parsed instructions from all the todo file re-reads can go
quadratic with the number of rebased commits.
The simple test below runs 'git rebase -i -x' on 1000 commits, which
takes over 14seconds to run. If it doesn't re-read the todo file at
all (I simply deleted the whole condition block checking the stat data
and re-reading) it runs for only ~2.5secs.
Just another angle to consider...
I know dscho was keen to avoid re-parsing the list all the time [1]
presumably because of the quadratic behavior. (He also assumed most
people were using ns stat times [2] but that appears not to be the case)
Could we just compare the text of the todo list on disk to whats in
todo->buf.buf (with an appropriate offset)? That would avoid parsing the
text and looking up all the commits with get_oid()
Best Wishes
Phillip
[1]
https://public-inbox.org/git/alpine.DEB.2.20.1703021617510.3767@virtualbox/
[2]
https://public-inbox.org/git/alpine.DEB.2.20.1704131526500.2135@virtualbox/
--- >8 ---
test_expect_success 'test' '
num_commits=1000 &&
test_commit_bulk --filename=file $num_commits &&
/usr/bin/time -f "Elapsed time: %e" \
git rebase -i --root -x true 2>out &&
grep "Executing: true" out >actual &&
test_line_count = $num_commits actual &&
# show the elapsed time
tail -n2 out
'
--- >8 ---