[PATCH v3 0/3] reftable/stack: use geometric table compaction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again,

This is the third version my patch series that refactors the reftable
compaction strategy to instead follow a geometric sequence. Changes compared
to v2:

 * Added test to validate the GIT_TEST_REFTABLE_NO_AUTOCOMPACTION
   environment variable works as expected.
 * Added additional clarifying comments and examples to explain how the new
   compaction strategy works.
 * Removed outdated comment from stack_test.c test

Thanks for taking a look!

-Justin

Justin Tobler (3):
  reftable/stack: add env to disable autocompaction
  reftable/stack: use geometric table compaction
  reftable/stack: make segment end inclusive

 reftable/stack.c           | 124 ++++++++++++++++++-------------------
 reftable/stack.h           |   3 -
 reftable/stack_test.c      |  67 +++++---------------
 reftable/system.h          |   1 +
 t/t0610-reftable-basics.sh |  58 ++++++++++++-----
 5 files changed, 120 insertions(+), 133 deletions(-)


base-commit: c75fd8d8150afdf836b63a8e0534d9b9e3e111ba
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1683%2Fjltobler%2Fjt%2Freftable-geometric-compaction-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1683/jltobler/jt/reftable-geometric-compaction-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1683

Range-diff vs v2:

 1:  cb6b152e5c8 ! 1:  2fdd8ea1133 reftable/stack: add env to disable autocompaction
     @@ reftable/stack.c: int reftable_addition_commit(struct reftable_addition *add)
      
       ## reftable/system.h ##
      @@ reftable/system.h: license that can be found in the LICENSE file or at
     - #include "strbuf.h"
     + #include "tempfile.h"
       #include "hash-ll.h" /* hash ID, sizes.*/
       #include "dir.h" /* remove_dir_recursively, for tests.*/
      +#include "parse.h"
       
       int hash_size(uint32_t id);
       
     +
     + ## t/t0610-reftable-basics.sh ##
     +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause auto-compaction' '
     + 	test_line_count = 1 repo/.git/reftable/tables.list
     + '
     + 
     ++test_expect_success 'ref transaction: environment variable disables auto-compaction' '
     ++	test_when_finished "rm -rf repo" &&
     ++
     ++	git init repo &&
     ++	test_commit -C repo A &&
     ++	for i in $(test_seq 20)
     ++	do
     ++		GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1
     ++	done &&
     ++	test_line_count = 23 repo/.git/reftable/tables.list &&
     ++
     ++	git -C repo update-ref foo HEAD &&
     ++	test_line_count = 1 repo/.git/reftable/tables.list
     ++'
     ++
     + check_fsync_events () {
     + 	local trace="$1" &&
     + 	shift &&
 2:  def70084523 ! 2:  7e62c2286ae reftable/stack: use geometric table compaction
     @@ Commit message
      
          Instead, to avoid unbounded growth of the table list, the compaction
          strategy is updated to ensure tables follow a geometric sequence after
     -    each operation. This is done by walking the table list in reverse index
     -    order to identify the compaction segment start and end. The compaction
     -    segment end is found by identifying the first table which has a
     -    preceding table size less than twice the current table. Next, the
     -    compaction segment start is found iterating through the remaining tables
     -    in the list checking if the previous table size is less than twice the
     -    cumulative of tables from the segment end. This ensures the correct
     -    segment start is found and that the newly compacted table does not
     -    violate the geometric sequence.
     +    each operation by individually evaluating each table in reverse index
     +    order. This strategy results in a much simpler and more robust algorithm
     +    compared to the previous one while also maintaining a minimal ordered
     +    set of tables on-disk.
      
          When creating 10 thousand references, the new strategy has no
          performance impact:
     @@ reftable/stack.c: static int segment_size(struct segment *s)
      +	 * they are already valid members of the geometric sequence. Due to the
      +	 * properties of a geometric sequence, it is not possible for the sum of
      +	 * these tables to exceed the value of the ending point table.
     ++	 *
     ++	 * Example table size sequence requiring no compaction:
     ++	 * 	64, 32, 16, 8, 4, 2, 1
     ++	 *
     ++	 * Example compaction segment end set to table with size 3:
     ++	 * 	64, 32, 16, 8, 4, 3, 1
      +	 */
      +	for (i = n - 1; i > 0; i--) {
      +		if (sizes[i - 1] < sizes[i] * 2) {
     @@ reftable/stack.c: static int segment_size(struct segment *s)
       			break;
      +		}
      +	}
     -+
     + 
     +-		min_seg.start = prev;
     +-		min_seg.bytes += sizes[prev];
      +	/*
      +	 * Find the starting table of the compaction segment by iterating
      +	 * through the remaining tables and keeping track of the accumulated
     -+	 * size of all tables seen from the segment end table.
     ++	 * size of all tables seen from the segment end table. The previous
     ++	 * table is compared to the accumulated size because the tables from the
     ++	 * segment end are merged backwards recursively.
      +	 *
      +	 * Note that we keep iterating even after we have found the first
      +	 * starting point. This is because there may be tables in the stack
      +	 * preceding that first starting point which violate the geometric
      +	 * sequence.
     ++	 *
     ++	 * Example compaction segment start set to table with size 32:
     ++	 * 	128, 32, 16, 8, 4, 3, 1
      +	 */
      +	for (; i > 0; i--) {
      +		uint64_t curr = bytes;
      +		bytes += sizes[i - 1];
     - 
     --		min_seg.start = prev;
     --		min_seg.bytes += sizes[prev];
     ++
      +		if (sizes[i - 1] < curr * 2) {
      +			seg.start = i - 1;
      +			seg.bytes = bytes;
     @@ reftable/stack_test.c: static void test_reftable_stack_hash_id(void)
       static void test_suggest_compaction_segment(void)
       {
      -	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
     +-	/* .................0    1    2  3   4  5  6 */
      +	uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
     - 	/* .................0    1    2  3   4  5  6 */
       	struct segment min =
       		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
      -	EXPECT(min.start == 2);
     @@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: writes cause a
       
       	test_commit -C repo --no-tag B &&
       	test_line_count = 1 repo/.git/reftable/tables.list
     +@@ t/t0610-reftable-basics.sh: test_expect_success 'ref transaction: environment variable disables auto-compact
     + 	do
     + 		GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1
     + 	done &&
     +-	test_line_count = 23 repo/.git/reftable/tables.list &&
     ++	test_line_count = 22 repo/.git/reftable/tables.list &&
     + 
     + 	git -C repo update-ref foo HEAD &&
     + 	test_line_count = 1 repo/.git/reftable/tables.list
       '
       
      +test_expect_success 'ref transaction: alternating table sizes are compacted' '
 3:  a23e3fc6972 ! 3:  9a33914c852 reftable/segment: make segment end inclusive
     @@ Metadata
      Author: Justin Tobler <jltobler@xxxxxxxxx>
      
       ## Commit message ##
     -    reftable/segment: make segment end inclusive
     +    reftable/stack: make segment end inclusive
      
          For a reftable segment, the start of the range is inclusive and the end
          is exclusive. In practice we increment the end when creating the

-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux