On Mon, Nov 25, 2019 at 02:43:07PM +0000, Phillip Wood wrote: > On 25/11/2019 13:18, SZEDER Gábor wrote: > >On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote: > >>To notice a changed todo file the sequencer stores the file's stat > >>data in its 'struct todo_list' instance, and compares it with the > >>file's current stat data after 'reword', 'squash' and 'exec' > >>instructions. If the two stat data doesn't match, it re-reads the > >>todo file. > >> > >>Sounds simple, but there are some subtleties going on here: > >> > >> - The 'struct todo_list' holds the stat data from the time when the > >> todo file was last read. > >> > >> - This stat data in 'struct todo_list' is not updated when the > >> sequencer itself writes the todo file. > >> > >> - Before executing each instruction during an interactive rebase, > >> the sequencer always updates the todo file by removing the > >> just-about-to-be-executed instruction. This changes the file's > >> size and inode [1]. > >> > >>Consequently, when the sequencer looks at the stat data after a > >>'reword', 'squash' or 'exec' instruction, it most likely finds that > >>they differ, even when the user didn't modify the todo list at all! > >>This is not an issue in practice, it just wastes a few cycles on > >>re-reading the todo list that matches what the sequencer already has > >>in memory anyway. > > > >It can be much more than just a few cycles, because the total number > >of parsed instructions from all the todo file re-reads can go > >quadratic with the number of rebased commits. > > > >The simple test below runs 'git rebase -i -x' on 1000 commits, which > >takes over 14seconds to run. If it doesn't re-read the todo file at > >all (I simply deleted the whole condition block checking the stat data > >and re-reading) it runs for only ~2.5secs. > > > >Just another angle to consider... > > I know dscho was keen to avoid re-parsing the list all the time [1] > presumably because of the quadratic behavior. (He also assumed most people > were using ns stat times [2] but that appears not to be the case) Nanosecond file timestamp comparisons are only enabled by the USE_NSEC macro, which is only defined if the USE_NSEC Makefile knob is enabled, but that is not enabled by default. Then there is the related NO_NSEC Makefile knob: # Define NO_NSEC if your "struct stat" does not have "st_ctim.tv_nsec" # available. This automatically turns USE_NSEC off. As Dscho mentioned in [2], we do disable nanosecond file timestamp comparisons in 'config.mak.uname' on a handful of platforms by setting NO_NSEC. This, however, does not mean that nanosec timestamps are enabled on other platforms by default. > Could we > just compare the text of the todo list on disk to whats in todo->buf.buf > (with an appropriate offset)? That would avoid parsing the text and looking > up all the commits with get_oid() Comparing the contents without parsing is still quadratic in the size of the todo list, though I suppose with a much lower constant factor than actually parsing it. > [1] > https://public-inbox.org/git/alpine.DEB.2.20.1703021617510.3767@virtualbox/ > [2] > https://public-inbox.org/git/alpine.DEB.2.20.1704131526500.2135@virtualbox/ > > > > > --- >8 --- > > > >test_expect_success 'test' ' > > num_commits=1000 && > > test_commit_bulk --filename=file $num_commits && > > > > /usr/bin/time -f "Elapsed time: %e" \ > > git rebase -i --root -x true 2>out && > > > > grep "Executing: true" out >actual && > > test_line_count = $num_commits actual && > > > > # show the elapsed time > > tail -n2 out > >' > > > > --- >8 --- > >