Re: [PATCH 5/7] p5313: add size comparison test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/21/24 3:31 PM, Taylor Blau wrote:
On Tue, Nov 05, 2024 at 03:05:05AM +0000, Derrick Stolee via GitGitGadget wrote:
From: Derrick Stolee <stolee@xxxxxxxxx>

The thin pack that simulates a push is much worse with --full-name-hash
in this case. The name hash values are doing a lot to assist with delta
bases, it seems. The big pack and shallow clone cases are slightly worse
with the --full-name-hash option. Only the full repack gains some
benefits in size.

Not a problem with your patch, but just thinking aloud: do you think
there is an easy/straightforward way to suggest when to use
--full-name-hash or not?

The kinds of heuristics I would use are:

1. Are there enough commits that enough files have enough versions
   across history that it's very important to keep deltas within a path?

2. Is the repository at least 500MB such that there is actually room for
   a "meaningful" change in size?

3. Are there a lot of name-hash collisions? (The last patch in the series
   helps do this through a test-helper, but isn't something we can expect
   end users to check themselves.)


+	cat >in-shallow <<-EOF
+	$(git rev-parse HEAD)
+	--shallow $(git rev-parse HEAD)
+	EOF
+'

I was going to comment that these could probably be moved into the
individual perf test that cares about reading each of these inputs. But
having them shared here makes sense since we are naturally comparing
generating two packs with the same input (with and without
--full-name-hash). So the shared setup here makes sense to me.

I also wanted to avoid having these commands be part of the time
measurement, even if they are extremely small.

+
+test_perf 'thin pack' '
+	git pack-objects --thin --stdout --revs --sparse  <in-thin >out
+'
+
+test_size 'thin pack size' '
+	test_file_size out
+'

Nice. I always forget about this and end up writing 'wc -c <out'.

I believe this is a Junio recommendation from an earlier version.

+test_size 'repack size' '
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"

Here and below, I think it's fine to inline this as in:

     test_file_size "$(ls .git/objects/pack/pack-*.pack)"

Generally I prefer to split things into stages so the verbose output
provides a clear definition of the value when calling the Git command.

...but I wonder: will using ".git" break this test in bare repositories?
Should we write instead:

     pack="$(ls $(git rev-parse --git-dir)/objects/pack/pack-*.pack)" &&
     test_file_size

?
While this would break a bare repo, the perf lib makes a bare repo be
copied into a non-bare repo as follows:

test_perf_copy_repo_contents () {
	for stuff in "$1"/*
	do
		case "$stuff" in
		*/objects|*/hooks|*/config|*/commondir|*/gitdir|*/worktrees|*/fsmonitor--daemon*)
			;;
		*)
			cp -R "$stuff" "$repo/.git/" || exit 1
			;;
		esac
	done
}

I'll still add the `git rev-parse` suggestion because it's safest.

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux