Re: [PATCH] t3429: try to protect against a potential racy todo file problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/11/2019 13:18, SZEDER Gábor wrote:
On Sun, Nov 24, 2019 at 10:10:21PM +0100, SZEDER Gábor wrote:
To notice a changed todo file the sequencer stores the file's stat
data in its 'struct todo_list' instance, and compares it with the
file's current stat data after 'reword', 'squash' and 'exec'
instructions.  If the two stat data doesn't match, it re-reads the
todo file.

Sounds simple, but there are some subtleties going on here:

   - The 'struct todo_list' holds the stat data from the time when the
     todo file was last read.

   - This stat data in 'struct todo_list' is not updated when the
     sequencer itself writes the todo file.

   - Before executing each instruction during an interactive rebase,
     the sequencer always updates the todo file by removing the
     just-about-to-be-executed instruction.  This changes the file's
     size and inode [1].

Consequently, when the sequencer looks at the stat data after a
'reword', 'squash' or 'exec' instruction, it most likely finds that
they differ, even when the user didn't modify the todo list at all!
This is not an issue in practice, it just wastes a few cycles on
re-reading the todo list that matches what the sequencer already has
in memory anyway.

It can be much more than just a few cycles, because the total number
of parsed instructions from all the todo file re-reads can go
quadratic with the number of rebased commits.

The simple test below runs 'git rebase -i -x' on 1000 commits, which
takes over 14seconds to run.  If it doesn't re-read the todo file at
all (I simply deleted the whole condition block checking the stat data
and re-reading) it runs for only ~2.5secs.

Just another angle to consider...

I know dscho was keen to avoid re-parsing the list all the time [1] presumably because of the quadratic behavior. (He also assumed most people were using ns stat times [2] but that appears not to be the case) Could we just compare the text of the todo list on disk to whats in todo->buf.buf (with an appropriate offset)? That would avoid parsing the text and looking up all the commits with get_oid()

Best Wishes

Phillip

[1] https://public-inbox.org/git/alpine.DEB.2.20.1703021617510.3767@virtualbox/ [2] https://public-inbox.org/git/alpine.DEB.2.20.1704131526500.2135@virtualbox/


   ---  >8  ---

test_expect_success 'test' '
	num_commits=1000 &&
	test_commit_bulk --filename=file $num_commits &&

	/usr/bin/time -f "Elapsed time: %e" \
		git rebase -i --root -x true 2>out &&

	grep "Executing: true" out >actual &&
	test_line_count = $num_commits actual &&

	# show the elapsed time
	tail -n2 out
'

   ---  >8  ---




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux