Re: [PATCH v2 2/3] reftable/stack: use geometric table compaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 21, 2024 at 10:40:18PM +0000, Justin Tobler via GitGitGadget wrote:
> From: Justin Tobler <jltobler@xxxxxxxxx>
> 
> To reduce the number of on-disk reftables, compaction is performed.
> Contiguous tables with the same binary log value of size are grouped
> into segments. The segment that has both the lowest binary log value and
> contains more than one table is set as the starting point when
> identifying the compaction segment.
> 
> Since segments containing a single table are not initially considered
> for compaction, if the table appended to the list does not match the
> previous table log value, no compaction occurs for the new table. It is
> therefore possible for unbounded growth of the table list. This can be
> demonstrated by repeating the following sequence:
> 
> git branch -f foo
> git branch -d foo
> 
> Each operation results in a new table being written with no compaction
> occurring until a separate operation produces a table matching the
> previous table log value.
> 
> Instead, to avoid unbounded growth of the table list, the compaction
> strategy is updated to ensure tables follow a geometric sequence after
> each operation. This is done by walking the table list in reverse index
> order to identify the compaction segment start and end. The compaction
> segment end is found by identifying the first table which has a
> preceding table size less than twice the current table. Next, the
> compaction segment start is found iterating through the remaining tables
> in the list checking if the previous table size is less than twice the
> cumulative of tables from the segment end. This ensures the correct
> segment start is found and that the newly compacted table does not
> violate the geometric sequence.

I don't think we need to go into so much detail how exactly the
algorithm is working -- these kind of comments should ideally exist in
the code. What would be more interesting to explain here is _why_ we
chose the new algorithm over the old one instead of just trying to fix
the issue.

Other than that this patch LGTM.

Patrick

> When creating 10 thousand references, the new strategy has no
> performance impact:
> 
> Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
>   Time (mean ± σ):     26.516 s ±  0.047 s    [User: 17.864 s, System: 8.491 s]
>   Range (min … max):   26.447 s … 26.569 s    10 runs
> 
> Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
>   Time (mean ± σ):     26.417 s ±  0.028 s    [User: 17.738 s, System: 8.500 s]
>   Range (min … max):   26.366 s … 26.444 s    10 runs
> 
> Summary
>   update-ref: create refs sequentially (revision = HEAD) ran
>     1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
> 
> Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
> tables and are therefore updated to specify the correct new table count.
> Since compaction is more aggressive in ensuring tables maintain a
> geometric sequence, the expected table count is reduced in these tests.
> In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
> removed because the function is no longer needed. Also, the
> `test_suggest_compaction_segment()` test is updated to better showcase
> and reflect the new geometric compaction behavior.
> 
> Signed-off-by: Justin Tobler <jltobler@xxxxxxxxx>
> ---
>  reftable/stack.c           | 109 ++++++++++++++++---------------------
>  reftable/stack.h           |   3 -
>  reftable/stack_test.c      |  66 +++++-----------------
>  t/t0610-reftable-basics.sh |  43 ++++++++++-----
>  4 files changed, 91 insertions(+), 130 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 2370d93d13b..ef55dc75cde 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1214,75 +1214,62 @@ static int segment_size(struct segment *s)
>  	return s->end - s->start;
>  }
>  
> -int fastlog2(uint64_t sz)
> -{
> -	int l = 0;
> -	if (sz == 0)
> -		return 0;
> -	for (; sz; sz /= 2) {
> -		l++;
> -	}
> -	return l - 1;
> -}
> -
> -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n)
> -{
> -	struct segment *segs = reftable_calloc(n, sizeof(*segs));
> -	struct segment cur = { 0 };
> -	size_t next = 0, i;
> -
> -	if (n == 0) {
> -		*seglen = 0;
> -		return segs;
> -	}
> -	for (i = 0; i < n; i++) {
> -		int log = fastlog2(sizes[i]);
> -		if (cur.log != log && cur.bytes > 0) {
> -			struct segment fresh = {
> -				.start = i,
> -			};
> -
> -			segs[next++] = cur;
> -			cur = fresh;
> -		}
> -
> -		cur.log = log;
> -		cur.end = i + 1;
> -		cur.bytes += sizes[i];
> -	}
> -	segs[next++] = cur;
> -	*seglen = next;
> -	return segs;
> -}
> -
>  struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
>  {
> -	struct segment min_seg = {
> -		.log = 64,
> -	};
> -	struct segment *segs;
> -	size_t seglen = 0, i;
> -
> -	segs = sizes_to_segments(&seglen, sizes, n);
> -	for (i = 0; i < seglen; i++) {
> -		if (segment_size(&segs[i]) == 1)
> -			continue;
> +	struct segment seg = { 0 };
> +	uint64_t bytes;
> +	size_t i;
>  
> -		if (segs[i].log < min_seg.log)
> -			min_seg = segs[i];
> -	}
> +	/*
> +	 * If there are no tables or only a single one then we don't have to
> +	 * compact anything. The sequence is geometric by definition already.
> +	 */
> +	if (n <= 1)
> +		return seg;
>  
> -	while (min_seg.start > 0) {
> -		size_t prev = min_seg.start - 1;
> -		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev]))
> +	/*
> +	 * Find the ending table of the compaction segment needed to restore the
> +	 * geometric sequence.
> +	 *
> +	 * To do so, we iterate backwards starting from the most recent table
> +	 * until a valid segment end is found. If the preceding table is smaller
> +	 * than the current table multiplied by the geometric factor (2), the
> +	 * current table is set as the compaction segment end.
> +	 *
> +	 * Tables after the ending point are not added to the byte count because
> +	 * they are already valid members of the geometric sequence. Due to the
> +	 * properties of a geometric sequence, it is not possible for the sum of
> +	 * these tables to exceed the value of the ending point table.
> +	 */
> +	for (i = n - 1; i > 0; i--) {
> +		if (sizes[i - 1] < sizes[i] * 2) {
> +			seg.end = i + 1;
> +			bytes = sizes[i];
>  			break;
> +		}
> +	}
> +
> +	/*
> +	 * Find the starting table of the compaction segment by iterating
> +	 * through the remaining tables and keeping track of the accumulated
> +	 * size of all tables seen from the segment end table.
> +	 *
> +	 * Note that we keep iterating even after we have found the first
> +	 * starting point. This is because there may be tables in the stack
> +	 * preceding that first starting point which violate the geometric
> +	 * sequence.
> +	 */
> +	for (; i > 0; i--) {
> +		uint64_t curr = bytes;
> +		bytes += sizes[i - 1];
>  
> -		min_seg.start = prev;
> -		min_seg.bytes += sizes[prev];
> +		if (sizes[i - 1] < curr * 2) {
> +			seg.start = i - 1;
> +			seg.bytes = bytes;
> +		}
>  	}
>  
> -	reftable_free(segs);
> -	return min_seg;
> +	return seg;
>  }
>  
>  static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
> diff --git a/reftable/stack.h b/reftable/stack.h
> index d919455669e..656f896cc28 100644
> --- a/reftable/stack.h
> +++ b/reftable/stack.h
> @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines);
>  
>  struct segment {
>  	size_t start, end;
> -	int log;
>  	uint64_t bytes;
>  };
>  
> -int fastlog2(uint64_t sz);
> -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n);
>  struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
>  
>  #endif
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 509f4866236..e5f6ff5c9e4 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void)
>  	clear_dir(dir);
>  }
>  
> -static void test_log2(void)
> -{
> -	EXPECT(1 == fastlog2(3));
> -	EXPECT(2 == fastlog2(4));
> -	EXPECT(2 == fastlog2(5));
> -}
> -
> -static void test_sizes_to_segments(void)
> -{
> -	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
> -	/* .................0  1  2  3  4  5 */
> -
> -	size_t seglen = 0;
> -	struct segment *segs =
> -		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
> -	EXPECT(segs[2].log == 3);
> -	EXPECT(segs[2].start == 5);
> -	EXPECT(segs[2].end == 6);
> -
> -	EXPECT(segs[1].log == 2);
> -	EXPECT(segs[1].start == 2);
> -	EXPECT(segs[1].end == 5);
> -	reftable_free(segs);
> -}
> -
> -static void test_sizes_to_segments_empty(void)
> -{
> -	size_t seglen = 0;
> -	struct segment *segs = sizes_to_segments(&seglen, NULL, 0);
> -	EXPECT(seglen == 0);
> -	reftable_free(segs);
> -}
> -
> -static void test_sizes_to_segments_all_equal(void)
> -{
> -	uint64_t sizes[] = { 5, 5 };
> -	size_t seglen = 0;
> -	struct segment *segs =
> -		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
> -	EXPECT(seglen == 1);
> -	EXPECT(segs[0].start == 0);
> -	EXPECT(segs[0].end == 2);
> -	reftable_free(segs);
> -}
> -
>  static void test_suggest_compaction_segment(void)
>  {
> -	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
> +	uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
>  	/* .................0    1    2  3   4  5  6 */
>  	struct segment min =
>  		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
> -	EXPECT(min.start == 2);
> -	EXPECT(min.end == 7);
> +	EXPECT(min.start == 1);
> +	EXPECT(min.end == 10);
>  }
>  
>  static void test_suggest_compaction_segment_nothing(void)
> @@ -884,6 +839,17 @@ static void test_empty_add(void)
>  	reftable_stack_destroy(st2);
>  }
>  
> +static int fastlog2(uint64_t sz)
> +{
> +	int l = 0;
> +	if (sz == 0)
> +		return 0;
> +	for (; sz; sz /= 2) {
> +		l++;
> +	}
> +	return l - 1;
> +}
> +
>  static void test_reftable_stack_auto_compaction(void)
>  {
>  	struct reftable_write_options cfg = { 0 };
> @@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
>  int stack_test_main(int argc, const char *argv[])
>  {
>  	RUN_TEST(test_empty_add);
> -	RUN_TEST(test_log2);
>  	RUN_TEST(test_names_equal);
>  	RUN_TEST(test_parse_names);
>  	RUN_TEST(test_read_file);
> @@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[])
>  	RUN_TEST(test_reftable_stack_update_index_check);
>  	RUN_TEST(test_reftable_stack_uptodate);
>  	RUN_TEST(test_reftable_stack_validate_refname);
> -	RUN_TEST(test_sizes_to_segments);
> -	RUN_TEST(test_sizes_to_segments_all_equal);
> -	RUN_TEST(test_sizes_to_segments_empty);
>  	RUN_TEST(test_suggest_compaction_segment);
>  	RUN_TEST(test_suggest_compaction_segment_nothing);
>  	return 0;
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 686781192eb..e6c3f94d874 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -293,12 +293,24 @@ test_expect_success 'ref transaction: writes cause auto-compaction' '
>  	test_line_count = 1 repo/.git/reftable/tables.list &&
>  
>  	test_commit -C repo --no-tag A &&
> -	test_line_count = 2 repo/.git/reftable/tables.list &&
> +	test_line_count = 1 repo/.git/reftable/tables.list &&
>  
>  	test_commit -C repo --no-tag B &&
>  	test_line_count = 1 repo/.git/reftable/tables.list
>  '
>  
> +test_expect_success 'ref transaction: alternating table sizes are compacted' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	test_commit -C repo A &&
> +	for i in $(test_seq 20)
> +	do
> +		git -C repo branch -f foo &&
> +		git -C repo branch -d foo || return 1
> +	done &&
> +	test_line_count = 2 repo/.git/reftable/tables.list
> +'
> +
>  check_fsync_events () {
>  	local trace="$1" &&
>  	shift &&
> @@ -324,7 +336,7 @@ test_expect_success 'ref transaction: writes are synced' '
>  		git -C repo -c core.fsync=reference \
>  		-c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
>  	check_fsync_events trace2.txt <<-EOF
> -	"name":"hardware-flush","count":2
> +	"name":"hardware-flush","count":4
>  	EOF
>  '
>  
> @@ -346,8 +358,8 @@ test_expect_success 'pack-refs: compacts tables' '
>  
>  	test_commit -C repo A &&
>  	ls -1 repo/.git/reftable >table-files &&
> -	test_line_count = 4 table-files &&
> -	test_line_count = 3 repo/.git/reftable/tables.list &&
> +	test_line_count = 3 table-files &&
> +	test_line_count = 2 repo/.git/reftable/tables.list &&
>  
>  	git -C repo pack-refs &&
>  	ls -1 repo/.git/reftable >table-files &&
> @@ -379,7 +391,7 @@ do
>  			umask $umask &&
>  			git init --shared=true repo &&
>  			test_commit -C repo A &&
> -			test_line_count = 3 repo/.git/reftable/tables.list
> +			test_line_count = 2 repo/.git/reftable/tables.list
>  		) &&
>  		git -C repo pack-refs &&
>  		test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
> @@ -747,12 +759,13 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' '
>  	test_when_finished "rm -rf repo worktree" &&
>  	git init repo &&
>  	test_commit -C repo A &&
> -	git -C repo worktree add ../worktree &&
> +	GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree &&
> +	GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD &&
>  
> -	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
> -	test_line_count = 4 repo/.git/reftable/tables.list &&
> +	test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
> +	test_line_count = 3 repo/.git/reftable/tables.list &&
>  	git -C repo pack-refs &&
> -	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
> +	test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
>  	test_line_count = 1 repo/.git/reftable/tables.list
>  '
>  
> @@ -760,19 +773,21 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
>  	test_when_finished "rm -rf repo worktree" &&
>  	git init repo &&
>  	test_commit -C repo A &&
> -	git -C repo worktree add ../worktree &&
> +	GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree &&
> +	GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD &&
>  
> -	test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
> -	test_line_count = 4 repo/.git/reftable/tables.list &&
> +	test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
> +	test_line_count = 3 repo/.git/reftable/tables.list &&
>  	git -C worktree pack-refs &&
>  	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
> -	test_line_count = 4 repo/.git/reftable/tables.list
> +	test_line_count = 3 repo/.git/reftable/tables.list
>  '
>  
>  test_expect_success 'worktree: creating shared ref updates main stack' '
>  	test_when_finished "rm -rf repo worktree" &&
>  	git init repo &&
>  	test_commit -C repo A &&
> +	test_commit -C repo B &&
>  
>  	git -C repo worktree add ../worktree &&
>  	git -C repo pack-refs &&
> @@ -780,7 +795,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' '
>  	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
>  	test_line_count = 1 repo/.git/reftable/tables.list &&
>  
> -	git -C worktree update-ref refs/heads/shared HEAD &&
> +	GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD &&
>  	test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
>  	test_line_count = 2 repo/.git/reftable/tables.list
>  '
> -- 
> gitgitgadget
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux