On Thu, Mar 21, 2024 at 10:40:18PM +0000, Justin Tobler via GitGitGadget wrote: > From: Justin Tobler <jltobler@xxxxxxxxx> > > To reduce the number of on-disk reftables, compaction is performed. > Contiguous tables with the same binary log value of size are grouped > into segments. The segment that has both the lowest binary log value and > contains more than one table is set as the starting point when > identifying the compaction segment. > > Since segments containing a single table are not initially considered > for compaction, if the table appended to the list does not match the > previous table log value, no compaction occurs for the new table. It is > therefore possible for unbounded growth of the table list. This can be > demonstrated by repeating the following sequence: > > git branch -f foo > git branch -d foo > > Each operation results in a new table being written with no compaction > occurring until a separate operation produces a table matching the > previous table log value. > > Instead, to avoid unbounded growth of the table list, the compaction > strategy is updated to ensure tables follow a geometric sequence after > each operation. This is done by walking the table list in reverse index > order to identify the compaction segment start and end. The compaction > segment end is found by identifying the first table which has a > preceding table size less than twice the current table. Next, the > compaction segment start is found iterating through the remaining tables > in the list checking if the previous table size is less than twice the > cumulative of tables from the segment end. This ensures the correct > segment start is found and that the newly compacted table does not > violate the geometric sequence. I don't think we need to go into so much detail how exactly the algorithm is working -- these kind of comments should ideally exist in the code. What would be more interesting to explain here is _why_ we chose the new algorithm over the old one instead of just trying to fix the issue. Other than that this patch LGTM. Patrick > When creating 10 thousand references, the new strategy has no > performance impact: > > Benchmark 1: update-ref: create refs sequentially (revision = HEAD~) > Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s] > Range (min … max): 26.447 s … 26.569 s 10 runs > > Benchmark 2: update-ref: create refs sequentially (revision = HEAD) > Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s] > Range (min … max): 26.366 s … 26.444 s 10 runs > > Summary > update-ref: create refs sequentially (revision = HEAD) ran > 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~) > > Some tests in `t0610-reftable-basics.sh` assert the on-disk state of > tables and are therefore updated to specify the correct new table count. > Since compaction is more aggressive in ensuring tables maintain a > geometric sequence, the expected table count is reduced in these tests. > In `reftable/stack_test.c` tests related to `sizes_to_segments()` are > removed because the function is no longer needed. Also, the > `test_suggest_compaction_segment()` test is updated to better showcase > and reflect the new geometric compaction behavior. > > Signed-off-by: Justin Tobler <jltobler@xxxxxxxxx> > --- > reftable/stack.c | 109 ++++++++++++++++--------------------- > reftable/stack.h | 3 - > reftable/stack_test.c | 66 +++++----------------- > t/t0610-reftable-basics.sh | 43 ++++++++++----- > 4 files changed, 91 insertions(+), 130 deletions(-) > > diff --git a/reftable/stack.c b/reftable/stack.c > index 2370d93d13b..ef55dc75cde 100644 > --- a/reftable/stack.c > +++ b/reftable/stack.c > @@ -1214,75 +1214,62 @@ static int segment_size(struct segment *s) > return s->end - s->start; > } > > -int fastlog2(uint64_t sz) > -{ > - int l = 0; > - if (sz == 0) > - return 0; > - for (; sz; sz /= 2) { > - l++; > - } > - return l - 1; > -} > - > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n) > -{ > - struct segment *segs = reftable_calloc(n, sizeof(*segs)); > - struct segment cur = { 0 }; > - size_t next = 0, i; > - > - if (n == 0) { > - *seglen = 0; > - return segs; > - } > - for (i = 0; i < n; i++) { > - int log = fastlog2(sizes[i]); > - if (cur.log != log && cur.bytes > 0) { > - struct segment fresh = { > - .start = i, > - }; > - > - segs[next++] = cur; > - cur = fresh; > - } > - > - cur.log = log; > - cur.end = i + 1; > - cur.bytes += sizes[i]; > - } > - segs[next++] = cur; > - *seglen = next; > - return segs; > -} > - > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) > { > - struct segment min_seg = { > - .log = 64, > - }; > - struct segment *segs; > - size_t seglen = 0, i; > - > - segs = sizes_to_segments(&seglen, sizes, n); > - for (i = 0; i < seglen; i++) { > - if (segment_size(&segs[i]) == 1) > - continue; > + struct segment seg = { 0 }; > + uint64_t bytes; > + size_t i; > > - if (segs[i].log < min_seg.log) > - min_seg = segs[i]; > - } > + /* > + * If there are no tables or only a single one then we don't have to > + * compact anything. The sequence is geometric by definition already. > + */ > + if (n <= 1) > + return seg; > > - while (min_seg.start > 0) { > - size_t prev = min_seg.start - 1; > - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) > + /* > + * Find the ending table of the compaction segment needed to restore the > + * geometric sequence. > + * > + * To do so, we iterate backwards starting from the most recent table > + * until a valid segment end is found. If the preceding table is smaller > + * than the current table multiplied by the geometric factor (2), the > + * current table is set as the compaction segment end. > + * > + * Tables after the ending point are not added to the byte count because > + * they are already valid members of the geometric sequence. Due to the > + * properties of a geometric sequence, it is not possible for the sum of > + * these tables to exceed the value of the ending point table. > + */ > + for (i = n - 1; i > 0; i--) { > + if (sizes[i - 1] < sizes[i] * 2) { > + seg.end = i + 1; > + bytes = sizes[i]; > break; > + } > + } > + > + /* > + * Find the starting table of the compaction segment by iterating > + * through the remaining tables and keeping track of the accumulated > + * size of all tables seen from the segment end table. > + * > + * Note that we keep iterating even after we have found the first > + * starting point. This is because there may be tables in the stack > + * preceding that first starting point which violate the geometric > + * sequence. > + */ > + for (; i > 0; i--) { > + uint64_t curr = bytes; > + bytes += sizes[i - 1]; > > - min_seg.start = prev; > - min_seg.bytes += sizes[prev]; > + if (sizes[i - 1] < curr * 2) { > + seg.start = i - 1; > + seg.bytes = bytes; > + } > } > > - reftable_free(segs); > - return min_seg; > + return seg; > } > > static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st) > diff --git a/reftable/stack.h b/reftable/stack.h > index d919455669e..656f896cc28 100644 > --- a/reftable/stack.h > +++ b/reftable/stack.h > @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines); > > struct segment { > size_t start, end; > - int log; > uint64_t bytes; > }; > > -int fastlog2(uint64_t sz); > -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n); > struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); > > #endif > diff --git a/reftable/stack_test.c b/reftable/stack_test.c > index 509f4866236..e5f6ff5c9e4 100644 > --- a/reftable/stack_test.c > +++ b/reftable/stack_test.c > @@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void) > clear_dir(dir); > } > > -static void test_log2(void) > -{ > - EXPECT(1 == fastlog2(3)); > - EXPECT(2 == fastlog2(4)); > - EXPECT(2 == fastlog2(5)); > -} > - > -static void test_sizes_to_segments(void) > -{ > - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 }; > - /* .................0 1 2 3 4 5 */ > - > - size_t seglen = 0; > - struct segment *segs = > - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); > - EXPECT(segs[2].log == 3); > - EXPECT(segs[2].start == 5); > - EXPECT(segs[2].end == 6); > - > - EXPECT(segs[1].log == 2); > - EXPECT(segs[1].start == 2); > - EXPECT(segs[1].end == 5); > - reftable_free(segs); > -} > - > -static void test_sizes_to_segments_empty(void) > -{ > - size_t seglen = 0; > - struct segment *segs = sizes_to_segments(&seglen, NULL, 0); > - EXPECT(seglen == 0); > - reftable_free(segs); > -} > - > -static void test_sizes_to_segments_all_equal(void) > -{ > - uint64_t sizes[] = { 5, 5 }; > - size_t seglen = 0; > - struct segment *segs = > - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes)); > - EXPECT(seglen == 1); > - EXPECT(segs[0].start == 0); > - EXPECT(segs[0].end == 2); > - reftable_free(segs); > -} > - > static void test_suggest_compaction_segment(void) > { > - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 }; > + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; > /* .................0 1 2 3 4 5 6 */ > struct segment min = > suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); > - EXPECT(min.start == 2); > - EXPECT(min.end == 7); > + EXPECT(min.start == 1); > + EXPECT(min.end == 10); > } > > static void test_suggest_compaction_segment_nothing(void) > @@ -884,6 +839,17 @@ static void test_empty_add(void) > reftable_stack_destroy(st2); > } > > +static int fastlog2(uint64_t sz) > +{ > + int l = 0; > + if (sz == 0) > + return 0; > + for (; sz; sz /= 2) { > + l++; > + } > + return l - 1; > +} > + > static void test_reftable_stack_auto_compaction(void) > { > struct reftable_write_options cfg = { 0 }; > @@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void) > int stack_test_main(int argc, const char *argv[]) > { > RUN_TEST(test_empty_add); > - RUN_TEST(test_log2); > RUN_TEST(test_names_equal); > RUN_TEST(test_parse_names); > RUN_TEST(test_read_file); > @@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[]) > RUN_TEST(test_reftable_stack_update_index_check); > RUN_TEST(test_reftable_stack_uptodate); > RUN_TEST(test_reftable_stack_validate_refname); > - RUN_TEST(test_sizes_to_segments); > - RUN_TEST(test_sizes_to_segments_all_equal); > - RUN_TEST(test_sizes_to_segments_empty); > RUN_TEST(test_suggest_compaction_segment); > RUN_TEST(test_suggest_compaction_segment_nothing); > return 0; > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > index 686781192eb..e6c3f94d874 100755 > --- a/t/t0610-reftable-basics.sh > +++ b/t/t0610-reftable-basics.sh > @@ -293,12 +293,24 @@ test_expect_success 'ref transaction: writes cause auto-compaction' ' > test_line_count = 1 repo/.git/reftable/tables.list && > > test_commit -C repo --no-tag A && > - test_line_count = 2 repo/.git/reftable/tables.list && > + test_line_count = 1 repo/.git/reftable/tables.list && > > test_commit -C repo --no-tag B && > test_line_count = 1 repo/.git/reftable/tables.list > ' > > +test_expect_success 'ref transaction: alternating table sizes are compacted' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + test_commit -C repo A && > + for i in $(test_seq 20) > + do > + git -C repo branch -f foo && > + git -C repo branch -d foo || return 1 > + done && > + test_line_count = 2 repo/.git/reftable/tables.list > +' > + > check_fsync_events () { > local trace="$1" && > shift && > @@ -324,7 +336,7 @@ test_expect_success 'ref transaction: writes are synced' ' > git -C repo -c core.fsync=reference \ > -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD && > check_fsync_events trace2.txt <<-EOF > - "name":"hardware-flush","count":2 > + "name":"hardware-flush","count":4 > EOF > ' > > @@ -346,8 +358,8 @@ test_expect_success 'pack-refs: compacts tables' ' > > test_commit -C repo A && > ls -1 repo/.git/reftable >table-files && > - test_line_count = 4 table-files && > - test_line_count = 3 repo/.git/reftable/tables.list && > + test_line_count = 3 table-files && > + test_line_count = 2 repo/.git/reftable/tables.list && > > git -C repo pack-refs && > ls -1 repo/.git/reftable >table-files && > @@ -379,7 +391,7 @@ do > umask $umask && > git init --shared=true repo && > test_commit -C repo A && > - test_line_count = 3 repo/.git/reftable/tables.list > + test_line_count = 2 repo/.git/reftable/tables.list > ) && > git -C repo pack-refs && > test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list && > @@ -747,12 +759,13 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' ' > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > - git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 3 repo/.git/reftable/tables.list && > git -C repo pack-refs && > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 1 repo/.git/reftable/tables.list > ' > > @@ -760,19 +773,21 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' ' > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > - git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD && > > - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list && > + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list && > + test_line_count = 3 repo/.git/reftable/tables.list && > git -C worktree pack-refs && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > - test_line_count = 4 repo/.git/reftable/tables.list > + test_line_count = 3 repo/.git/reftable/tables.list > ' > > test_expect_success 'worktree: creating shared ref updates main stack' ' > test_when_finished "rm -rf repo worktree" && > git init repo && > test_commit -C repo A && > + test_commit -C repo B && > > git -C repo worktree add ../worktree && > git -C repo pack-refs && > @@ -780,7 +795,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' ' > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 1 repo/.git/reftable/tables.list && > > - git -C worktree update-ref refs/heads/shared HEAD && > + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD && > test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list && > test_line_count = 2 repo/.git/reftable/tables.list > ' > -- > gitgitgadget >
Attachment:
signature.asc
Description: PGP signature