On 25/11/2019 15:15, SZEDER Gábor wrote:
On Mon, Nov 25, 2019 at 02:43:07PM +0000, Phillip Wood wrote:
On 25/11/2019 13:18, SZEDER Gábor wrote:
On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
To notice a changed todo file the sequencer stores the file's stat
data in its 'struct todo_list' instance, and compares it with the
file's current stat data after 'reword', 'squash' and 'exec'
instructions. If the two stat data doesn't match, it re-reads the
todo file.
Sounds simple, but there are some subtleties going on here:
- The 'struct todo_list' holds the stat data from the time when the
todo file was last read.
- This stat data in 'struct todo_list' is not updated when the
sequencer itself writes the todo file.
- Before executing each instruction during an interactive rebase,
the sequencer always updates the todo file by removing the
just-about-to-be-executed instruction. This changes the file's
size and inode [1].
Consequently, when the sequencer looks at the stat data after a
'reword', 'squash' or 'exec' instruction, it most likely finds that
they differ, even when the user didn't modify the todo list at all!
This is not an issue in practice, it just wastes a few cycles on
re-reading the todo list that matches what the sequencer already has
in memory anyway.
It can be much more than just a few cycles, because the total number
of parsed instructions from all the todo file re-reads can go
quadratic with the number of rebased commits.
The simple test below runs 'git rebase -i -x' on 1000 commits, which
takes over 14seconds to run. If it doesn't re-read the todo file at
all (I simply deleted the whole condition block checking the stat data
and re-reading) it runs for only ~2.5secs.
Just another angle to consider...
I know dscho was keen to avoid re-parsing the list all the time [1]
presumably because of the quadratic behavior. (He also assumed most people
were using ns stat times [2] but that appears not to be the case)
Nanosecond file timestamp comparisons are only enabled by the USE_NSEC
macro, which is only defined if the USE_NSEC Makefile knob is enabled,
but that is not enabled by default.
Then there is the related NO_NSEC Makefile knob:
# Define NO_NSEC if your "struct stat" does not have "st_ctim.tv_nsec"
# available. This automatically turns USE_NSEC off.
As Dscho mentioned in [2], we do disable nanosecond file timestamp
comparisons in 'config.mak.uname' on a handful of platforms by setting
NO_NSEC. This, however, does not mean that nanosec timestamps are
enabled on other platforms by default.
Could we
just compare the text of the todo list on disk to whats in todo->buf.buf
(with an appropriate offset)? That would avoid parsing the text and looking
up all the commits with get_oid()
Comparing the contents without parsing is still quadratic in the size
of the todo list, though I suppose with a much lower constant factor
than actually parsing it.
The patch below (assuming thunderbird doesn't mangle it) reduces the
time to run your bulk commit test from 30s to 7s, if I delete the
condition block which checks the stat data it takes 4.7s on my (somewhat
ancient) laptop. So there is a cost to the string comparison approach
but it's much less that the full todo list parsing.
Best Wishes
Phillip
--- >8 ---
diff --git a/sequencer.c b/sequencer.c
index 8952cfa89b..a3efdae0a5 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3909,12 +3909,17 @@ static int pick_commits(struct repository *r,
arg, item->arg_len,
opts, res, 0);
} else if (check_todo && !res) {
- struct stat st;
-
- if (stat(get_todo_path(opts), &st)) {
- res = error_errno(_("could not stat '%s'"),
+ int offset;
+ struct strbuf buf = STRBUF_INIT;
+ if (strbuf_read_file(&buf, get_todo_path(opts),
8096) < 0)
+ res = error_errno(_("could not read '%s'"),
get_todo_path(opts));
- } else if (match_stat_data(&todo_list->stat, &st)) {
+
+ offset = get_item_line_offset(todo_list,
+ todo_list->current
+ 1);
+ if (buf.len != todo_list->buf.len - offset ||
+ memcmp(buf.buf, todo_list->buf.buf + offset,
buf.len)) {
+ fputs("re-reading todo list\n", stderr);
/* Reread the todo file if it has
changed. */
todo_list_release(todo_list);
if (read_populate_todo(r, todo_list, opts))