Re: [PATCH v3 24/34] t/perf/p7519: speed up test using "test-tool touch"

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 20 Jul 2021 21:18:56 +0200

On Tue, Jul 13 2021, Jeff Hostetler wrote:

> On 7/13/21 2:18 PM, Ævar Arnfjörð Bjarmason wrote:
>> On Tue, Jul 13 2021, Jeff Hostetler wrote:
>> 
>>> On 7/1/21 7:09 PM, Ævar Arnfjörð Bjarmason wrote:
>>>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote:
>>>>
>>>>> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
>>>>>
>>>>> Change p7519 to use a single "test-tool touch" command to update
>>>>> the mtime on a series of (thousands) files instead of invoking
>>>>> thousands of commands to update a single file.
>>>>>
>>>>> This is primarily for Windows where process creation is so
>>>>> very slow and reduces the test run time by minutes.
>>>>>
>>>>> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
>>>>> ---
>>>>>    t/perf/p7519-fsmonitor.sh | 14 ++++++--------
>>>>>    1 file changed, 6 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
>>>>> index 5eb5044a103..f74e6014a0a 100755
>>>>> --- a/t/perf/p7519-fsmonitor.sh
>>>>> +++ b/t/perf/p7519-fsmonitor.sh
>>>>> @@ -119,10 +119,11 @@ test_expect_success "one time repo setup" '
>>>>>    	fi &&
>>>>>      	mkdir 1_file 10_files 100_files 1000_files 10000_files &&
>>>>> -	for i in $(test_seq 1 10); do touch 10_files/$i; done &&
>>>>> -	for i in $(test_seq 1 100); do touch 100_files/$i; done &&
>>>>> -	for i in $(test_seq 1 1000); do touch 1000_files/$i; done &&
>>>>> -	for i in $(test_seq 1 10000); do touch 10000_files/$i; done &&
>>>>> +	test-tool touch sequence --pattern="10_files/%d" --start=1 --count=10 &&
>>>>> +	test-tool touch sequence --pattern="100_files/%d" --start=1 --count=100 &&
>>>>> +	test-tool touch sequence --pattern="1000_files/%d" --start=1 --count=1000 &&
>>>>> +	test-tool touch sequence --pattern="10000_files/%d" --start=1 --count=10000 &&
>>>>> +
>>>>>    	git add 1_file 10_files 100_files 1000_files 10000_files &&
>>>>>    	git commit -qm "Add files" &&
>>>>>    @@ -200,15 +201,12 @@ test_fsmonitor_suite() {
>>>>>    	# Update the mtimes on upto 100k files to make status think
>>>>>    	# that they are dirty.  For simplicity, omit any files with
>>>>>    	# LFs (i.e. anything that ls-files thinks it needs to dquote).
>>>>> -	# Then fully backslash-quote the paths to capture any
>>>>> -	# whitespace so that they pass thru xargs properly.
>>>>>    	#
>>>>>    	test_perf_w_drop_caches "status (dirty) ($DESC)" '
>>>>>    		git ls-files | \
>>>>>    			head -100000 | \
>>>>>    			grep -v \" | \
>>>>> -			sed '\''s/\(.\)/\\\1/g'\'' | \
>>>>> -			xargs test-tool chmtime -300 &&
>>>>> +			test-tool touch stdin &&
>>>>>    		git status
>>>>>    	'
>>>> Did you try to replace this with some variant of:
>>>>       test_seq 1 10000 | xargs touch
>>>> Which (depending on your xargs version) would invoke "touch"
>>>> commands
>>>> with however many argv items it thinks you can handle.
>>>>
>>>
>>> a quick test on my Windows machine shows that
>>>
>>> 	test_seq 1 10000 | xargs touch
>>>
>>> takes 3.1 seconds.
>>>
>>> just a simple
>>>
>>> 	test_seq 1 10000 >/dev/null
>>>
>>> take 0.2 seconds.
>>>
>>> using my test-tool helper cuts that time in half.
>> There's what Elijah mentioned about test_seq, so maybe it's just
>> that.
>> But what I was suggesting was using the xargs mode where it does N
>> arguments at a time.
>> Does this work for you, and does it cause xargs to invoke "touch"
>> with
>> the relevant N number of arguments, and does it help with the
>> performance?
>>      test_seq 1 10000 | xargs touch
>>      test_seq 1 10000 | xargs -n 10 touch
>>      test_seq 1 10000 | xargs -n 100 touch
>>      test_seq 1 10000 | xargs -n 1000 touch
>
> The GFW SDK version of xargs does have `-n N` and it does work as
> advertised.  And it does slow down things considerably.  Letting it
> do ~2500 per command in 4 commands took the 3.1 seconds listed above.
>
> Add a -n 100 to it takes 5.7 seconds, so process creation overhead
> is a factor here.

Doesn't -n 2500 being faster than -n 100 suggest the opposite of process
overhead being the deciding factor? With -n 2500 you'll invoke 4 touch
processes, so one takes 2500/3.1 =~ 0.8s to run, whereas with -n 100 you
invoke 100 of them, so if the overall time is then 5.7 seconds that's
5.7/100 =~ ~0.06s.

Or am I misunderstanding you, or does some implicit parallelism kick in
with that version of xargs depending on -n?

>> etc.
>> Also I didn't notice this before, but the -300 part of "chmtime
>> -300"
>> was redundant before then? I.e. you're implicitly changing it to "=+0"
>> instead with your "touch" helper, are you not?
>> 
>
> Right. I'm changing it to the current time.

If that "while we're at it change the behavior of the test" is wanted I
think it should be called out in the commit message. Right now it looks
like it might be an unintentional regression in the test.